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LONG TERMINAL REPEAT, ENHANCER, AND INSULATOR 
SEQUENCES FOR USE IN RECOMBINANT VECTORS 
BACKGROUND OF THE INVENTION 

The human endogenous retroviruses (HERVs) were inserted into the 
5 germ cells of primates millions of years ago and have remained as an integral 
part of the primate genomes during evolution. In addition to the proviruses, 
solo LTRs are also dispersed throughout the human genome (Wilkinson et al, 
1994; Lower et al, 1996). The solo LTRs contain the U3, R and U5 regions 
(Temin, 1982) but no internal gag, pol and env genes. Together, the HERVs 

1 0 and the solo LTRs comprise approximately 5% of the human genome and 
belong to the category of middle repetitive DNAs characterized as 
retrotransposons (A.F. Smit, 1996; Henikoff et al, 1997). 

The ERV-9 proviruses, containing 30-50 members, constitute one of 
many families of the HERVs (Wilkinson et al, 1 993; Lower et al, 1 996). In 

1 5 addition to the proviruses, solo ERV-9 LTRs with a copy number of 3000- 
4000 have been found in the human genome (Henthorn et al, 1986; La 
Mantia, 1991; Schlessiger, 1992). The ERV-9 retrotransposons were 
inserted into the primate genome probably as early as ten million years ago 
(Di Cristofano et al, 1995). The retrotransposons have been suggested to be 

20 selfish DNAs irrelevant to the cellular functions of the hosts (Dolittle and 
Sapienza, 1980). However, recent findings indicate that the enhancer and 
promoter elements in the U3 region of the LTRs (Lenz et al. 1984; Speck et 
al, 1990) initiate and promote the transcription of host genes located 
immediately downstream of the LTRs and may thus serve relevant cellular 

25 functions (Stravenhagen and Robins, 1988; Feuchter et al. 1 992; Goodchild 
et al, 1992; Ting et al, 1992; Schulte et al, 1996). 

The human -like globin genes consist of the embryonic the fetal 
G and A , and the adult and genes located on Chromosome 1 1 in a 
transcriptional order of 5* -G -A - - 3' (Efstratiadis et al. 1980). The 

30 transcription of these genes is regulated by the far upstream Locus Control 
Region (LCR), which is defined by four erythroid specific. DNase I 
hypersensitive sites HS 1, 2, 3 and 4 (Tuan et al, 1985; Forrester et al, 1987; 
Grosveld et al, 1987; Dhar et al, 1990). The LCR between HS1 and HS4 is 
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present in other mammals from mouse to galago and comprises the major 
functional component of the LCR (reviewed by Hardison et al, 1997). A 
ubiquitous HS5 site has been identified further upstream of the HS 1-4 sites 
(Tuan et al, 1985; Dhar et al, 1990) in the apparent 5 r boundary area of the 
5 LCR. 

Enhancer elements are cis-acting and increase the level of 
transcription of an adjacent gene from its promoter in a fashion that is 
relatively independent of the position and orientation of the enhancer 
element. In fact, Khoury and Gruss, 1983, Cell 33:313. state that "the 
10 remarkable ability of enhancer sequences to function upstream from, within, 
or downstream from eukaryotic genes distinguishes them from classical 
promoter elements ..." and suggest that certain experimental results 
indicate that "enhancers can act over considerable distances (perhaps >10 
kb)." 

1 5 Enhancer elements have been identified in a number of viruses, 

including polyoma virus, papilloma virus, adenovirus, retrovirus, hepatitis 
virus, cytomegalovirus, herpes virus, papovaviruses, such as simian virus 40 
(SV40) and BK, and in many non-viral genes, such as within mouse 
immunoglobulin gene introns. Enhancer elements may also be present in a 

20 wide variety of other organisms. Host cells often react differently to different 
enhancer elements. This cellular specificity indicates that host gene products 
interact with the enhancer element during gene expression. 

Although gene replacement by homologous recombination could be 
used instead of integrating vectors, this approach is not yet technically 

25 practical because of the very low success rate of the homologous 

recombination events and the inability to culture the pluripotent stem cells 
required for this approach. 

BRIEF SUMMARY OF THE INVENTION 
Disclosed are an enhancer, insulator, and promoter from the HS5 

30 region in the 5' boundary area of the locus control region of human -like 
globin genes. These transcription control sequences can be used to control 
expression of any desired gene of interest and can be used in any vector for 
this purpose. The control sequences are derived from the area in and around 
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the U3 region of a solitary endogenous retrovirus (ERV) 9 long terminal 
repeat (LTR). 

Also disclosed are methods of expressing any gene of interest. For 
this purpose, the control sequences can be operably linked to the gene of 
5 interest (and operably linked to each other). The disclosed enhancers. 

insulators, and promoters can also be used with any other control sequences. 
Preferably, the control sequences are used in vectors to obtain expression of 
a gene of interest in a cell, including cells in animals. 

BRIEF DESCRIPTION OF THE DRAWINGS 

10 Figure 1 is a diagram of the location and structure of the ERV LTR in 

the boundary area of the P-globin LCR. The top line shows the human P- 
like globin gene locus. Solid Boxes are the embryonic fetal y- and adult 
6- and P-globin genes. The vertical arrows indicate locations of the DNase I 
hypersensitive sites HS 1 ? 2 T 3. 4 and 5. The hatched box 5' of the HS5 site 

15 is a solo ERV-9 LTR. The hatched box 3' of the p-globin gene is a second 
copy of the ERV-9 LTR located 30 kb 3' of the p-globin gene (Henthorn et 
al, 1986; Anagnou et al, 1995). The middle line is the enlarged 5' boundary 
area drawn to scale according to the 1 kb scale bar. Open, hatched and gray 
boxes are respective locations of the HS5 site, ERV-9 LTR and an arbitrary 

20 upstream region (Ups) which was used as a control sequence for the LTR in 
reporter gene assays and RT-PCR studies. The bottom line is the structure of 
the LTR. Short horizontal arrows are the 14 short tandem repeats in the U3 
region. Solid bar is the R region. Long horizontal arrows are the three 
longer repeats in the U5 region. 

25 Figures 2A and 2B is the sequence of the 5'HS5 LTR in the 5'sl .4 

phage DNA clone from K562 cells (SEQ ID NO:l). The four bases GTAT 
with the heavy overline and underline located at the 5* and 3' ends of the 
LTR are the presumed integration site of the LTR in the human genomic 
DNA. The horizontal arrows in U3 are the 14 tandem repeats of 37-41 bases 

30 in the U3 region. Angled arrow is the presumed transcriptional initiation site 
in the LTR, marking the beginning of the R region. The long horizontal 
arrows in the U5 region are the three repeats of 70 bases in U5. Arrowheads 
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connected to dotted overlines are locations of the PCR primers used in DN A 
PCR and RT-PCR studies discussed in Example 1 . Directions of the 
arrowheads are the 5' to 3 ? direction of the primers. 

Figure 3 is a comparison of the sequences of the U3 repeats. The top 
5 line is the organization of the four subtype U3 repeats 1 . 2, 3 and 4 in 5'HS5 
LTR. P is the promoter in the U3 region. In the middle are the sequences of 
the subtype repeats 1, 2, 3 r and 4 (SEQ ID NOs:8, 9, 10. and 1 1, 
respectively). Underlined bases are the GATA, CCAAT, CACCC or 
CCACC motifs. At the bottom are consensus sequences of the U3 repeats in 

10 different ERV-9 LTRs. 5'HS5 (SEQ ID NO: 12), 3'p (SEQ ID NO: 13) and 
LTR2 (SEQ ID NO: 14) are the 5'HS5 LTR, the LTR at 25 kb 3' of the p- 
globin gene (Henthorn et al. 1986; Anagnou et al, 1995), and the LTR in a 
random human DNA clone (Lania et al, 1992), respectively. Lower case 
letters separated by slashes indicate polymorphic bases in the U3 repeats. 

1 5 Figure 4 is a sequence comparison of three U3 promoters and the e- 

globin promoter. At the top is the U3 promoter of the 5'HS5 LTR 
(nucleotides 1 194 to 1287 of SEQ ID NO:l). The overlined bases are the 
equivalent of the TATA box (Strazzullo et al, 1994). Underlined bases are 
the DNA motifs found also in the U3 repeats. Angled arrow is the 

20 transcriptional initiation site in LTR2 (La Mantia et al. 1992; Strazzullo et al, 
1994) and the presumed transcriptional initiation site in the 5'HS5 LTR. At 
the bottom is the sequence alignment of the four promoters in the 5'HS5 LTR 
(nucleotides 1 194 to 1287 of SEQ ID NO:l), 3'P LTR (SEQ ID NO:2) and 
LTR2 (SEQ ID NO:3), respectively. Dashes are DNA base deletions. 

25 Figures 5A-5D is a sequence alignment of the normal human (Hu N: 

nucleotides 624 to 1781 of SEQ IDNO:l), truncated human (Hu S: SEQ II) 
NO:6) and gorilla (Gori; SEQ ID NO:7) LTRs. Majority bases represents 
the consensus DNA sequence among the three LTRs (SEQ ID NO:5). 
Numbers between two horizontal lines are the DNA base ruler with base 1 

30 being the first base of the first U3 repeat in the LTRs. Vertical arrows are 
the positions of the first base in the U3 repeats. Dots represent the same 
bases in the human or gorilla DNAs as those in the consensus sequence. 
Dashes represent base deletions. The GTAT bases at positions 1081-84 
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marked with heavy overline are the integration site of the 5*HS5 LTR in both 
human and gorilla DNAs. 

Figure 6 is a diagram comparing the structures of the 5'HS5 LTR in 
the genomes of human and gorilla and in people of different racial lineages. 
5 Hu N is the human LTR of the normal length with 14 U3 repeats. Hu S is 
the human LTR of a shorter length with 1 1 U3 repeats. Gori is the gorilla 
LTR with 5 U3 repeats. Numbers in parentheses are the total number of 
bases in the LTRs including 140 bases of genomic DNAs downstream of the 
LTR insertion site — the GTAT bases, that were amplified by the PCR 

1 0 primers. Bent lines in Hu S and Gori are deletions of three and nine 

complete U3 repeats in the truncated human and gorilla LTRs respectively. 

Figure 7 is a diagram of the structure of recombinant CAT constructs. 
LTR is a 1 kb LTR sequence. Ups is 1.2 kb of DNA upstream of the LTR 
(see Figure 1 ). sp is a 200 bp e~globin promoter. HS2 is a 0.74 kb HS2 

15 enhancer. HS5 is a 1.2 kb sequence spanning the HS5 site. 

Figure 8 is a graph of enhancer and promoter activities (in percent of 
substrate converted) of the 5'HS5 LTR in recombinant CAT constructs Ups- 
CAT, HS2-ep-CAT and LTR-CAT plasmids transiently transfected into 
K562, MEL and HL60 cells. Percent Conv is percentage conversion of the 

20 l4 C-chloramphenicol substrate by the CAT enzyme produced by the 

transfected test plasmid after normalization with respect to a common level 
of a co-transfected CMV-p-gal plasmid. 

Figure 9 is a graph of enhancer and promoter activities (in percent of 
substrate converted) of the 5'HS5 LTR in recombinant CAT plasmids ep- 

25 CAT, HS2-6p-CAT, LTR-CAT, LTR-ep-CAT, HS5-ep-CAT and LTR-HS5- 
ep-CAT integrated into the genome of K562 cells. Percent Conv is the 
percentage conversion of the 1 ^-chloramphenicol substrate by the CAT 
enzyme produced by the integrated plasmids after normalization with respect 
to the per cell copy numbers of the plasmids. 

30 Figure 1 0 is a diagram of the 5'HS5 LTR in normal human DNA with 

14 U3 enhancer repeats. The four horizontal lines 1, 2. 3 and 4 represent the 
anticipated RT-PCR fragments amplified respectively by Primer pairs 1-4, 
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synthesized according to the K562 sequence in Figure 2. Numbers below the 
lines are the anticipated sizes in base pairs of the amplified cDNA fragments. 

Figure 1 1 is a diagram of examples of constructs using the disclosed 
enhancers and promoters. 

Figure 12 is a diagram of examples of constructs using the disclosed 
enhancers and promoters. 

Figure 13 is a diagram of examples of constructs using the disclosed 
insulators. 

DETAILED DESCRIPTION OF THE INVENTION 

Transcription of the human B-like globin genes in erythroid cells is 
regulated by the far-upstream locus control region (LCR). Five kilobases of 
new upstream DNA were cloned and sequenced in order to define the 5' 
border of the LCR. An LTR-retrotransposon belonging to the ERV-9 family 
of human endogenous retroviruses was found in the apparent 5' boundary 
area of the LCR. This ERV-9 LTR contains an unusual U3 enhancer region 
comprised of fourteen tandem repeats with recurrent G ATA, CACCC and 
CCAAT motifs. This LTR is conserved in human and gorilla, indicating its 
evolutionary stability in the genomes of primates. In both recombinant 
constructs and the endogenous human genome, the LTR enhancer and 
promoter activate the transcription of cis-linked DNA preferentially in 
erythroid cells. 

Sequencing data of the 5* border region of the LCR reveal a solitary 
ERV-9 LTR with the characteristics of a retrotransposon in a location near 
the HS5 site (see Figure 1). This 5' HS5 LTR possesses an unusual 
sequence feature in the U3 enhancer region which is comprised of fourteen 
tandem repeats of a consensus DNA of 41 bases. These U3 repeats as well 
as the downstream promoter contain recurrent GATA. CACCC and CCAAT 
motifs. This LTR-retrotransposon is conserved with 98-99% sequence 
identities in people of different races and in the gorilla, except that some 
. people have eleven instead of fourteen U3 repeats and the gorilla has only 
five U3 repeats. Functional tests with the CAT reporter gene assays 
demonstrate that the human 5' HS5 LTR activates the cis-linked CAT gene 
and possesses enhancer and promoter activities in erythroid cells. In the 
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CAT reporter gene assays, the LTR also synergized with and activated the 
cis-linked HS5 site. Consistent with these results, RT-PCR studies of 
cellular RNAs isolated from human primary cells and cell lines indicate that 
the endogenous LTR activates transcription of the downstream R, U5 and the 
5 genomic DNA at a higher level in erythroid than in nonerythroid cells. 

Disclosed are enhancers, insulators, and promoters derived from the 
HS5 region in the 5' boundary area of the locus control region of -like 
globin genes. These transcription control sequences can be used to control 
expression of any desired gene of interest and can be used in any vector for 
10 this purpose. The control sequences are derived from the area in and around 
the U3 region of a solitary endogenous retrovirus long terminal repeat (ERV- 
9 LTR). 

Also disclosed are methods of expressing any gene of interest. For 
this purpose, the control sequences can be operably linked to the gene of 

1 5 interest (and operably linked to each other). The disclosed enhancers, 

insulators, and promoters can also be used with any other control sequences. 
Preferably, the control sequences are used in vectors to obtain expression of 
a gene of interest in a cell, including cells in animals. 

Current strategies for gene expression in mammals and mammalian 

20 cells, especially gene therapy of hereditary or acquired blood diseases, 

employ retrovirus-mediated gene-transfer techniques. One of the common 
problems of this approach has been the extinction of the expression of the 
transgenes by the long terminal repeats (LTRs) of the vector flanking the 
therapeutic transgene and by the host sequences flanking the LTR-transgenic 

25 cassette. The disclosed enhancers-derived from the powerful enhancer 

discovered in the solitary LTR of the ERV-9 human endogenous retrovirus 
located in the 5 T border of the B-globin Locus Control Region— can alleviate 
this problem. The ERV-9 LTR-enhancer is most active in erythroid cells and 
can thus be used to replace the LTR in the retroviral vector to avoid the 

30 transcriptional silencing of the transgene and to boost the transcription of the 
therapeutic transgene in erythroid progenitor cells. Another problem with 
gene expression in animal and mammalian cells, interference from flanking 
transcription, can be alleviated using the disclosed insulator. The disclosed 
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insulators are derived from a stretch of LTR DNA of 600 bases, which 
contains a very high G and C bases of 70% and is located immediately 
upstream of the ERV-9 LTR enhancer. The disclosed insulators can be used 
to insulate expression cassettes, especially those to be inserted in the genome 
5 of the host celL from the transcriptional interference and silencing of the 
flanking host sequences. 

The solitary ERV-9 LTR sequence in the B-globin Locus Control 
Region belongs to middle repetitive sequences in the human genome with a 
haploid copy number of 3000-4000. The first copy of a solitary ERV-9 LTR 

10 was reported in 1989. The functional significance of the ERV-9 LTRs 

dispersed in the human genome may be to transcriptionally activate and thus 
mark the cis-linked loci of hematopoietic genes and gene families in early 
progenitor cells during ontogeny and hematopoietic lineage differentiation 
and the specific function of the solo ERV-9 LTR located near the HS5 site in 

1 5 the 5' border of the human B-globin locus control region (LCR) may initiate 
transcription of the LCR during early stages of ontogeny and this 
transcription process of the LCR regulates the transcriptional activation of 
the further downstream B-like globin genes during erythropoiesis. 

Specifically disclosed are nucleic acid molecules comprising all or a 

20 functional portion of the U3 enhancer (nucleotides 595 to 1 193 of Figure 2; 
nucleotides 595 to 1 193 of SEQ ID NO:l), or modified forms of the U3 
enhancer, where a functional portion is a portion of the U3 enhancer that 
retains enhancer function. Also disclosed are nucleic acid molecule 
comprising all or a functional portion of the U3 insulator (nucleotides 5 to 

25 594 of Figure 2; nucleotides 5 to 594 of SEQ ID NO:l). or modified forms of 
the U3 insulator, where a functional portion of the U3 insulator is a portion 
of the U3 insulator that retains insulator function. Also disclosed are nucleic 
acid molecules comprising (1) all or a functional portion of the U3 enhancer 
(nucleotides 595 to 1 193 of Figure 2; nucleotides 595 to 1 193 of SEQ ID 

30 NO: 1), or modified forms of the U3 enhancer, operably linked to (?) all or a 
functional portion of the U3 insulator (nucleotides 5 to 594 of Figure 2; 
nucleotides 5 to 594 of SEQ ID NO: 1 ), or modified forms of the U3 
insulator, where a functional portion is a portion of the U3 enhancer that 
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retains enhancer function and where a functional portion of the U3 insulator 
is a portion of the U3 insulator that retains insulator function. 

Also disclosed are nucleic acid molecules comprising all or a 
functional portion of the U3 promoter (nucleotides 1 194 to 1322 of Figure 2; 
nucleotides 1 194 to 1322 of SEQ ID NO:l). or modified forms of the U3 
promoter, where a functional portion of the U3 promoter is a portion of the 
U3 promoter that retains promoter function. Also disclosed are nucleic acid 
molecules comprising (1) all or a functional portion of the U3 enhancer 
(nucleotides 595 to 1 193 of Figure 2; nucleotides 595 to 1 193 of SEQ ID 
NO:l) ; or modified forms of the U3 enhancer, operably linked to (2) all or a 
functional portion of the U3 promoter (nucleotides 1 194 to 1322 of Figure 2; 
nucleotides 1 194 to 1322 of SEQ ID NO:l), or modified forms of the U3 
promoter, where a functional portion is a portion of the U3 enhancer that 
retains enhancer function and where a functional portion of the U3 promoter 
is a portion of the U3 promoter that retains promoter function. 

Also disclosed are nucleic acid molecules comprising the U3 R 
region (nucleotides 1322 to 1380 of Figure 2; nucleotides 1322 to 1380 of 
SEQ ID NO:l), or modified forms of the U3 R region. Also disclosed are 
nucleic acid molecules comprising (1) all or a functional portion of the U3 
enhancer (nucleotides 595 to 1 193 of Figure 2; nucleotides 595 to 1 193 of 
SEQ ID NO:l), or modified forms of the U3 enhancer, operably linked to (2) 
the U3 R region (nucleotides 1322 to 1380 of Figure 2: nucleotides 1322 to 
1380 of SEQ ID NO:l), or modified forms of the U3 R region, where a 
functional portion is a portion of the U3 enhancer that retains enhancer 
function. 

Also disclosed are nucleic acid molecules comprising (1) all or a 
functional portion of the U3 enhancer (nucleotides 595 to 1 193 of Figure 2; 
nucleotides 595 to 1 193 of SEQ ID NO:l), or modified forms of the U3 
enhancer; operably linked to (2) all or a functional portion of the U3 
insulator (nucleotides 5 to 594 of Figure 2; nucleotides 5 to 594 of SEQ ID 
NO: 1), or modified forms of the U3 insulator; and operably linked to (3) all 
or a functional portion of the U3 enhancer (nucleotides 595 to 1 193 of Figure 
2; nucleotides 595 to 1 193 of SEQ ID NO:l), or modified forms of the U3 



WO 00/23606 PCT/US99/24646 

enhancer; where a functional portion is a portion of the U3 enhancer that 
retains enhancer function, where a functional portion of the U3 insulator is a 
portion of the U3 insulator that retains insulator function, and where a 
functional portion of the U3 promoter is a portion of the U3 promoter that 
5 retains promoter function. 
Enhancers 

The disclosed enhancers have enhancer function. Enhancers function 
to increase the transcription from promoters in proximity to the enhancer. 
The disclosed enhancers, like many enhancers, can function both upstream 

1 0 and downstream from a gene, and in either orientation. The disclosed 
enhancers are, or are derived from, all or a functional portion of the U3 
enhancer (nucleotides 595 to 1 193 of Figure 2; nucleotides 595 to 1 193 of 
SEQ ID NOT), or modified forms of the U3 enhancer, where a functional 
portion is a portion of the U3 enhancer that retains enhancer function. The 

1 5 disclosed enhancers can be combined with other transcription control 
elements, including the disclosed insulators and promoters. 

Disclosed are primate 5' HS5 ERV-9 LTR enhancers. In particular, 
human and gorilla 5' HS5 ERV-9 LTR enhancers are disclosed. A preferred 
form of enhancer is the U3 enhancer present on nucleotides 595 to 1 193 of 

20 Figure 2 (nucleotides 595 to 1 193 of SEQ ID NOT). The U3 enhancer is 

made up of fourteen repeat units, where each repeat has one of the following 
four sequences: 

TATCTAGCTCAGGGATTGTAAATACACCAATCGGCAGTCTG (SEQ 
ID NO:8), 

25 TGTCTAGCTCAAGGTTTGTAAACACACCAATCAGCACCC1 G (SEQ 
IDNO:9), 

TATCTAGCTCAGGGTTTGTGAATGCACCAATCAACACTCTG (SEQ 
IDNOTO), 

TGTCTAGCTACTCTGTGGGGACGTGGAGAACCTTTA (SEQ ID 
30 NO:ll). 

Also disclosed are modified forms of the U3 enhancer where the 
modified enhancer retains enhancer function.. These include: 
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Enhancers having three or more repeats, where each repeat has one of 
the following sequences: 

TRTCTAGCTCADGGTTTGTRAAYRCACCAATCAGCACTCTG (SEQ 
ID NO: 12), 

5 TATCTAGCTCAGGGATTGTAAATACACCAATCGGCAGTCTG (SEQ 
ID NO:8), 

TGTCTAGCTC A AGGTTTGTAAAC AC AGC AATC AGC ACCCTG ( SEQ 
IDNO:9), 

TATCTAGCTCAGGGTTTGTGAATGCACCAATCAACACTCTG (SEQ 
10 ID NO: 10), 

TGTCTAGCTACTCTGTGGGGACGTGGAGAACCTTTA (SEQ ID 
NO:ll). 

Enhancers having three or more repeats, where each repeat has one of 
the following sequences: 
15 TATCTAGCTCAGGGATTGTAAATACACCAATCGGCAGTCTG (SEQ 
ID NO:8), . 

TGTCTAGCTCAAGGTTTGTAAACACACCAATCAGCACCCTG (SEQ 
ID NO:9), 

TATCTAGCTCAGGGTTTGTGAATGCACCAATCAACACTCTG (SEQ 
20 ID NO: 10), 

TGTCTAGCTACTCTGTGGGGACGTGGAGAACCTTTA (SEQ ID 
NO: 11). 

Enhancers having three or more repeats, where each repeat has the 
following sequence: 

25 TRTCTAGCTCADGGTTTGTRAAYRCACCAATCAGCACTCTG (SEQ 
IDNO:12). 

Enhancers where the enhancer has from three to fourteen repeat units. 
Enhancers where one or more of the repeat units of the enhancer are 
deleted, one or more of the repeat units are replaced with a repeat unit of the 
30 enhancer having a different sequence than the repeat unit that is replaced, 
one or more repeat units of the enhancer are added to the enhancer, or a 
combination of one or more of these modifications. 

11 
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The disclosed control sequences can be used, alone or in 
combination, to express any gene of interest. For this purpose, the control 
sequences can be operably linked to the gene of interest. Preferably, the 
gene encodes a protein. Preferably, the control sequences are used in vectors 
5 to obtain expression of a gene of interest in a cell, including cells in animals. 
Preferred vectors include retroviral vectors, adenoviral vectors, and other 
vectors suitable for gene expression in mammalian cells and/or suitable for 
gene therapy. Many vectors are known and the disclosed control sequences 
can be used in any of these vectors. 

10 Also disclosed are cells transformed with vectors containing one or 

more of the disclosed control sequences. That is vectors containing one or 
more of the disclosed enhancers, insulators, or promoters. Preferred cells are 
eukaryotic cells, animal cells, and mammalian cells. Also disclosed is a 
method of expressing a protein, the method comprising culturing cells 

1 5 transformed with a vector containing one or more of the disclosed control 
sequences operably linked to the gene. Also disclosed is a method of 
expressing a gene in an animal, the method comprising introducing into the 
animal cells transformed with a vector containing one or more of the 
disclosed control sequences operably linked to the gene. Also disclosed is a 

20 method of expressing a gene in an animal, the method comprising 

introducing into cells of an animal a vector containing one or more of the 
disclosed control sequences operably linked to the gene. 
Insulators 

Insulators are nucleic acid segments that reduce or eliminate 
25 transcription from adjacent regions from affecting the nucleic acid segment 
to which the insulator is associated. The disclosed insulators preferably are 
placed upstream of other control sequences and/or downstream of genes. 
Insulators are preferably placed between different genes, transcription units, 
or genetic domains to reduce or prevent interference of the adjacent 
30 expression sequences. The disclosed insulators are, or are derived from, all 
or a functional portion of the U3 insulator (nucleotides 5 to 594 of Figure 2; 
nucleotides 5 to 594 of SEQ ID NO:l), or modified forms of the U3 
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insulator, where a functional portion of the U3 insulator is a portion of the 
U3 insulator that retains insulator function. 
Promoters 

Promoters are nucleic acid segments that mediate initiation of 
5 transcription. The disclosed promoters are, or are derived from, all or a 

functional portion of the U3 promoter (nucleotides 1 194 to 1322 of Figure 2; 
nucleotides 1 194 to 1322 of SEQ ID NO:l), or modified forms of the U3 
promoter, where a functional portion of the U3 promoter is a portion of the 
U3 promoter that retains promoter function. 
1 0 Use Of Control Elements 

The disclosed enhancers, insulators, and promoters can be used in a 
variety of vectors and expression constructs to regulate and promote 
transcription of genetic elements placed in the same constructs. The 
disclosed control elements are preferably used in retroviral vectors to obtain 
15 expression in mammalian cells, and especially to express genes in cells in, or 
to be introduced into, animals (including humans) for gene therapy. 
Specific examples of such uses are: 

1 . The 5 'HS5 ERV-9 LTR and/or its component U3 enhancer, insulator, 
and promoter, the R and the U5 regions can be used to replace the LTRs or 

20 their equivalent U3, R and U5 regions of retroviral vectors designed for gene 
therapy of hereditary or acquired hematological diseases including sickle cell 
disease, thalassemias, leukemias and AIDS. 

2. The U3 enhancer, insulator, and promoter, and the R region can be 
used to activate (and/or insulate) in hematopoietic cells the transcription of a 

25 cis-linked transgene in either viral or non-viral vectors. The host cells for the 
transgene can be the hematopoietic stem cells, progenitor cells or mature 
lineage differentiated cells such as the erythroid, myeloid or lymphoid cells. 

3. Base mutations, and/or rearrangements and substitution of repeat 
units, can be introduced into the U3 and R regions to enable the U3 enhancer 

30 and promoter and the R region to work more efficiently in a specific 

hematopoietic lineage such as the erythroid. myeloid or lymphoid lineage. 
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Design of the retroviral vectors and transgenic cassettes. 

1 . The disclosed enhancers, promoters, R region, and U5 region can be 
used to replace the LTRs or their component U3, R and U5 regions of 
retroviral vectors designed for gene therapy of hereditary or acquired 
5 hematological diseases. The disclosed insulators can also be added to the 
vector. The replacement can be in either the 5' or the 3" LTR or both the 5 1 
and 3' LTRs of an appropriate retroviral vector. Example constructs are 
shown in Figure 11. 

U3: the U3 enhancer and promoter of the 5'HS5 ERV-9 LTR 
10 R: the R region of the 5 ? HS5 ERV-9 LTR 

U5: the U5 region of the 5 ? HS5 ERV-9 LTR 

U3E: the U3 enhancer of the 5'HS5 ERV-9 LTR 

U3p. R and U5: the L"3 promoter, R and U5 regions of appropriate 

non-5 1 HS5 ERV-9 LTRs. 
15 2. Constructs such as those shown in Figure 1 2 can be used to activate 

the transcription of cis-linked transgene spliced in either viral or r non-viral 

vectors in hematopoietic cells. 

U3: the U3 enhancer and promoter of the 5'HS5 ERV-9 LTR 
R: the R region of the 5 ? HS5 ERV-9 LTR 
20 U5: the U5 region of the 5 ? HS5 ERV-9 LTR 

U3E: the U3 enhancer of the 5'HS5 ERV-9 LTR 
U3P; the U3 promoter of the 5'HS5 ERV-9 LTR 

R and U5: the R and U5 regions of appropriate non-5'HS5 ERV-9 LTRs. 
P: appropriate promoter other than the U3 promoter of the 5'HS5 

25 ERV-9 LTR. 

3. The disclosed insulators can be used to insulate integrated transgenes 
in hematopoietic and non-hematopoietic cells from transcriptional 
interference exerted by the host genome and or elimination by the host 
genome over time, so that the transgene can be efficiently transcribed from 

30 its own enhancer and promoter and also can be stably integrated in the host 
genome over time. Examples of constructs using the disclosed insulators are 
shown in Figure 13. Such constructs will have improved expression 
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consistency and stability by limiting or eliminating the influence of flanking 
transcription activities. 

The U3 enhancer repeats of the 5'HS5 LTR can also be used to 
identify transcription factors that bind to the enhancer. The transcription 
5 factors bound by the DNA motifs in U3 repeats can be identified by 

electrophoretic mobility shift assays (EMSA) with nuclear extracts isolated 
from cells, such as K562 and placenta trophoblasts, and supershift assays 
with antibodies against various known transcription factors. Such techniques 
for use with other protein binding sites are well established and can be used 
1 0 with the disclosed enhancers. 

The genes encoding new transcription factors identified through this 
process can then be cloned. The molecular architecture and activity of the 
U3 enhancer complex can also be examined by site-directed mutagenesis of 
the U3 repeats in test plasmids containing the Green Fluorescent Protein 
1 5 (GFP) reporter gene, following transfection into cells, such as K562, CFU-E 
and placental trophoblast cells. 
Constructs and Vectors 

The disclosed control elements (that is, the disclosed enhancers, 

insulators, and promoters) are useful for expression of any desired gene. For 
20 this purpose, the disclosed control elements can be included in constructs and 

vectors designed for expression of genes of interest. Many such vectors are 

known. Preferred vectors are those for use in animals cells, and in particular, 

those for use in mammalian cells. 

Examples of vectors and delivery techniques that can be adapted for 
25 use with the disclosed control elements are described in U.S. Patent No. 

5,968,735, U.S. Patent No. 5,965,440, U.S. Patent No. 5.965,358, U.S. 

Patent No. 5,932,210, U.S. Patent No. 5,925,565, U.S. Patent No. 5.888,820, 

U.S. Patent No. 5,888,767, U.S. Patent No. 5,886,166. U.S. Patent No. 

5.871,997, U.S. Patent No. 5.866,696, U.S. Patent No. 5.866,41 1, U.S. 
30 Patent No. 5,858,744, U.S. Patent No. 5,856,152, U.S. Patent No. 5,837,503, 

U.S. Patent No. 5,830,727. U.S. Patent No. 5,817,492. U.S. Patent No. 

5.814,482, U.S. Patent No. 5,811,260, U.S. Patent No. 5.795,577, U.S. 

Patent No. 5,789,244, U.S. Patent No. 5,783,442, U.S. Patent No. 5,770,400, 
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U.S. Patent No. 5,759,852, U.S. Patent No. 5,756,264, U.S. Patent No. 
5,753,499, U.S. Patent No. 5.744,133, and U.S. Patent No. 5,710,037. 
Gene Therapy 

The disclosed control elements can be used in vectors and constructs 
5 for gene therapy. "Gene therapy" refers to the treatment of pathologic 
conditions by the addition of exogenous nucleic acids to appropriate cells 
within the organism. The disclosed contol elements can be used to express 
and increase the efficiency of expression of genes added in gene therapy. 
Nucleic acids must be added to the cell, transfected or transfected, such that 

1 0 they remain functional within the cell. The disclosed insulators can protect 
introduced genes from interfering endogenous transcription at the site of 
insertion. For most gene therapy strategies, the new nucleic acids are designed 
to function as new genes, i.e.. code for new RNA or messenger RNA, which in 
turn codes for new protein. Alternatively, therapeutic genes can produce 

1 5 antisense or ribozymes which can directly effect cellular or pathogen functions 
without having to express protein from mRNA. Gene therapy can be directed 
towards monogenetic disorders like adenosine deaminase deficiency and 
cystic fibrosis or to polygenetic somatic disorders like cancer. 

Human gene therapy has been successfully applied to correct genetic 

20 diseases in adenosine deaminase deficiency (severe combined 

immunodeficiency) (Approved Protocol) "Treatment of Severe Combined 
Immunodeficiency Disease (SCID) Due to Adenosine Deaminase (ADA) 
Deficiency with Autologous Lymphocytes Transduced with a Human ADA 
Gene" Hum. Gene Ther. 1:327-362 (1990); Anderson, W.F. "Human Gene 

25 Therapy" Science 256:808-8 1 3) and familial cholesterolaemia (Grossman, et 
al. Nature Genetics 6:335-341 (1994)). Many new gene therapy protocols are 
in progress or being planned (Morgan and Anderson Ann. Rev. Biochem. 
62: 1 91 -2 1 7 ( 1 993)). Vectors, constructs, and protocols described in the 
studies above can be adapted for use with the disclosed control elements. 

30 The rapid implementation of gene therapy in human trials has been 

made possible by the development of relatively efficient means of transferring 
new nucleic acids into cells, a process generally referred to as "gene 
transduction". The clinically applicable gene transduction methods fall into 
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one of three categories: a) cationic lipids, (b) molecular conjugates and (c) 
recombinant viruses. These different means of accomplishing gene 
transfection have been recently reviewed by Morgan, Ann. Rev. Biochem. 
62:191 (1992); Mulligan Science 260:926 (1993); and Tolstoshev Ann. Rev. 
5 Pharm. Toxicol. 32:573 (1 993)). Any of these transfer systems can be used for 
constructs using the disclosed control elements. 

Most of the successful human gene therapy protocols utilize vectors 
derived from defective murine leukemia retroviruses (Anderson Science 
256:808-813 (1992); Miller Nature 357:455-460 (1992); Miller Curr. Top. 

10 Microbiol. Immunol. 158:1-24 (1992), for review of these vectors and the 
packaging cell lines, Miller. Methods in Enzymology 217:581-599 (1993)). 
Although there is a limitation in the size of the gene (up to 7 to 8 kb) that can 
be transducted, the retrovirus based vectors have the advantage in that they can 
incorporate a permanent cop}' of the delivered gene into the chromosomes of 

1 5 the recipient cells and therefore potentially can represent a cure for a disorder 
arising due to the expression of an undesirable protein, activation of an 
oncogene, or insufficient expression or expression of a defective protein. Due 
to their retroviral origins, the disclosed control elements are particularly suited 
for use in retroviral vectors. 

20 The majority of the gene transfer procedures used to date for human 

gene therapy is known as an ex vivo gene transfer. The recipient cells are 
removed from the patient and grown in a cell culture laboratory. Replication- 
incompetent, virus-like particles containing the therapeutic gene, which are 
produced from packaging cells, are used to transduce the recipient cells. The 

25 transduced recipient cells are then selected by growing in selection media, 
expanded and returned to the patient. The packaging cells are genetically 
engineered cell lines that, once a therapeutic gene is transferred into the cells, 
produce virus-like particles containing the therapeutic gene to be delivered 
into other cells. 

30 Other gene transferring vehicles in which the disclosed control 

elements can be used are those based on human immunodeficiency virus 
(HIV) (Poznansky, et al. J. Virol. 65:532-536 (1991); Buchschacher, et al. J. 
Virol. 66:2731-2739 (1992); Shimada, et al. J. Clin. Invest. 88:1043-1047 
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(1991) ) and adeno-associated virus (Chatterjee, et al. Science 258:1485-1488 

(1992) ; Muzyczka Curr. Top. Microbiol. Immunol. 158:97-129 (1992)). 

An HIV based delivery system is believed to be particularly suitable 
for gene therapy against AIDS. Not only can the genes transferred by HIV 
5 virus-based vectors be integrated into the genome of non-dividing cells 

(Weinberg, et al. J. Exp. Med. 174:1477-1482 (1991); Bukrinsky, et al. Proc. 
Natl. Acad. Sci. U.S.A. 89:6580-6584 (1992); Lewis, et al. [published erratum 
appears in EMBO J. Nov: 1 1(1 1):4249 (1992)] EMBO. J. 1 1:3053-3058 
(1992)) ? the presence of HIV gpl20 on the surface of the gene delivering 

1 0 particles renders them specific for gene delivery to CD4- i - cells. 

The U3 enhancer region in 5' HS5 LTR contains an unusual sequence 
of fourteen tandem repeats of 37-41 bases. The tandem repeats are 
comprised of four subtypes 1. 2, 3 and 4, which are arranged in the LTR in 
the order 1-2-3-4-1-2-3-4-1-2-3-4-4-1. The consensus sequence of the U3 

1 5 repeats (SEQ ID NO: 12) reveals five conserved motifs. GATA, TAGCTCA, 
GGTTTGT (or GGTGG/CCACC in subtype 4) and CCAAT. The motifs 
GATA, CCAAT and CACC can potentially bind to cognate transcription 
factors abundantly expressed in hematopoietic and erythroid cells. 

The consensus sequence of U3 repeats shows higher than 90% 

20 sequence homology with that of the U3 repeats of the 3 * ERV-9 LTR 

located 25 kb 3' of the -globin gene and of LTR2, a random clone of ERV- 
9 LTR (Figure 3). 

The promoter sequence in the LTR is located in the U3 region at the 
3' end of the U3 repeats and is immediately upstream of the transcribed R 

25 region whose 5 5 border marks the transcriptional initiation site for retroviral 
RNA synthesis. The promoter of the 5'HS5 LTR shows a sequence 
homology of 80% with the promoter of the 3 ' LTR and of over 90% with 
the promoter of LTR2. The transcriptional initiation site of LTR2 has been 
determined by primer extension to be located 28 bases downstream of the 

30 AATAAAA box. Because of extensive sequence homologies between the 
5'HS5 LTR and the LTR2 promoters, especially the 100% sequence 
homology in the 70 DNA bases flanking the AATAAAA box, the 
transcriptional initiation site of the 5'HS5 LTR was placed at the identical T 
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base 28 bases downstream of the AATAAAA box. All three LTR promoters 
contain the GATA. CACCC and CCAAT motifs at identical locations, -36, - 
46 and -63 bases respectively, relative to the retroviral transcriptional 
initiation site. 

5 The 5'HS5 LTR promoter also bears structural similarities with the 

promoters of the further downstream -, - and -globin genes in that a 
combination of similar GATA. CACCC and CCAAT motifs is found also 
upstream of the AATAAAA boxes in the globin promoters. In particular, the 
5'HS5 LTR and the -globin promoter share additional sequence homologies 

10 in the region immediately 5 ? of the transcriptional initiation site. The above 
homologies indicate that like the globin promoters, the 5'HS5 LTR enhancer 
and promoter ought to be active in erythroid cells. Indeed, transfection 
assays show that the 5'HS5 LTR exhibits enhancer and promoter activities 
and can promote the transcription of cis-linked DNA to relatively high levels 

1 5 in erythroid cells and in placenta. 

The consensus sequence of the modular U3 repeats in 5'HS5 LTR 
reveals that the modular U3 repeat contains five well conserved and recurrent 
DNA motifs organized invariably in the following 5'->3 ? order: GATA, 
TAGCTCA, GGTTTGT (or TGGTGGG in subtype 4) and 

20 CACCAATCAGCA (nucleotides 25 to 36 of SEQ ID NO: 1 2). This 
invariable sequence structure suggests a definitive organization of the 
cognate protein factors in the assembly of the U3 enhancer complex. 

The GATA motifs bind to the GATA family of transcription factors 
including GATA- L -2 and -3. Targeted disruptions of the GATA-1, -2 and - 

25 3 genes have been reported to cause severe abnormaties in hematopoiesis and 
erythropoiesis, indicating that these factors play important regulatory roles in 
erythroid cells. Different GATA factors are expressed at relatively higher 
levels in different hematopoietic cells. In CD34+ hematopoietic 
stem/progenitor cells, GATA-2 is expressed at a high level relative to 

30 GATA-3 and GATA-1. In erythroid K562 cells, both GATA-1 and GATA-2 
are expressed. In CFU-E. GATA-1 is the major detected GATA factor. In 
placenta trophoblasts, GATA-2 and GATA-3 are expressed. 

19 
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The CACCC motifs bind to erythroid transcription factors EKLF and 
BKLF. EKLF is expressed at very low levels in K562 cells expressing the 
embryonic globin program and at much higher levels in MEL cells 
expressing the adult globin program. Unlike EKLF, BKLF is expressed 
5 abundantly in embryonic yolk sac and fetal liver and is not confined to 
erythroid cells. However, the motif in the U3 repeats is CACC and not 
CACCC found in the strong EKLF and BKLF binding sites, and may thus 
bind to these factors weakly or bind to different factor(s). 

The CCAAT motifs may bind to two families of protein factors, the 

10 C/EBPs expressed in various hematopoietic cells and adipocytes and the 

ubiquitous NF-Y complex. The C/EBP transcription factors include C/EBP 
, , , , , and CHOP, a dominant negative inhibitor of the C/EBPs. They 
bind to the CCAAT motifs as a homodimer or heterodimer through the -ZIP 
domain. The CCAAT boxes have been reported to play pivotal roles in the 

1 5 activities of the globin promoters, suggesting the existence in erythroid cells 
of transcription factors that bind to and activate the CCAAT boxes. 
However, none of the C/EBP , , , and are present at detectable levels in 
erythroid K562 cells and C/EBP , a ubiquitous factor, appears to be 
expressed mainly in lymphoid cells. This suggests that in K562 cells the 

20 CCAAT box may be bound paradoxically by negative regulators CHOP and 
CDP or primarily by the ubiquitous NF-Y complex. 

The NF-Y complex, also named CP 1, consists of three subunits A, B 
and C. All three subunits are required for binding to the CCAAT box as a 
trimeric complex through the histone fold motif, which bears similarity to the 

25 DNA binding domain of the histones. The NF-Y factors through the histone 
fold domain may also associate with histone acetyltransferase and thus be 
able to remodel and open up the chromatin structure of the CCAAT box and 
its neighboring DNA. In EMSA gels with nuclear extract from erythroid 
cells, after the NF-Y complex was supershifted with antibodies, the CCAAT 

30 box containing probe still formed shifted complexes. This suggests that 

erythroid cells may contain yet unidentified nuclear factors that may bind to 
the CCAAT motifs in U3 repeats. 

20 
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The remaining two conserved sequence motifs TAGCTCA and 
GGTTTGT in the U3 repeats may also be bound by yet unidentified 
transcription factors present in erythroid cells. It is of interest to note that 
motifs similar to TAGCTCA are found also in enhancers and promoters of 
genes expressed in various hematopoietic lineages: TAGCCTGA in the 
MLV U3 enhancer, TAGCTAA in the promoter of M-CSF receptor gene and 
TAGCTTCA in the Invariant Chain promoter of the major histocompatibility 
complex. 

The enhancers of many genes including the HS2 enhancer of the - 
globin LCR usually span several hundred bases and are bound by many 
different protein factors, which make the analysis of the enhancer complex a 
complicated task. In contrast, the 14 modular U3 repeats in the 5'HS5 LTR 
contain up to four well conserved DNA motifs and may be bound by 
similarly limited number of recurrent protein factors, making it a simpler 
task to analyze the structure of this enhancer complex. 

Example 

This example describes the cloning and characterization of the 5' 
border region of the LCR upstream of human B-like globin genes. 
MATERIALS AND METHODS 

Isolation of 5' 1.4 phage clone and DNA sequencing: The 5' 1.4 
phage cone spanning 12 kb of DNA 5' of the HS4 site was obtained from a 
K562 genomic DNA library constructed in EMBL phage (Weber-Benarous 
et al, 1 988). The library was screened with a unique DNA probe 5' 1 .4 
located near the HS4 site in the LCR (Li et al, 1985). The genomic DNA 
insert contained 8 kb of DNA spanning the HS5 site whose sequence was 
subsequently reported (Yu et al, 1994) and 5 kb of further upstream new 
DNA. The 8 kb of DNA was cleaved by Hind III into four sub-fragments of 
2.7 kb spanning the HS5 site and 1 .5, 1 .6 and 2 kb spanning the new DNA. 
They were subcloned into a plasmid vector (Tuan et al. 1990) and sequenced 
with the dideoxy terminator method (Sanger et al, 1977 1 using Sequenase or 
Taquenase Kit (USB Corp). This sequence strategy produced unambiguous 
DNA sequencing ladders for the entire 8 kb of DNA except for the 1 kb of 
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DNA in the junction area between the 1.5 and 1.6 kb subclones which 
contained the repetitive sequences of the ERV-9 LTR. The junction DNA 
was recloned into a phagemid vector Bluescript II SK(+A) (Stratagene) and 
the single stranded DNA was sequenced as above. The sequences were 
5 assembled and analyzed using the GCG DNA analysis software. The 8 kb 
DNA sequence was submitted to GenBank (Banklt 193637 AF064190). 

Purification of genomic DNAs from the gorilla and people of 
different races: Genomic DNAs were isolated anonymously from human 
blood samples collected by the Hemoglobin Laboratory at the Medical 

1 0 College of Georgia for diagnosis of thalassemia and sickle cell disease. 

African samples were from patients homozygous for sickle cell disease or 
Hereditary Persistence of Fetal Hemoglobin (HPFH) ? Arabic and Asian 
samples were from people hemizygous for a-thalassemia and the Caucasian 
samples were from normal individuals or patients with p-thalassemia. The 

1 5 gorilla blood sample was obtained from the Yerkes Primate Center of Emory 
University. High molecular weight genomic DNAs were purified from 
nucleated blood cells (Poncz et al, 1982) 

PCR-ampIification of the 5'HS5 LTR in genomic DNAs and 
sequence analysis of the amplified LTR: The 5'HS5 LTRs were amplified 

20 from genomic DNAs with Primer pair 3 used also for RT-PCR (Figure 10; 
forward primer, positions 595-616 and reverse primer 1 807-1 83 1 , Figure 2; 
nucleotides 595 to 616 of SEQ ID NO:l). PCR conditions consist of an 
initial denaturation at 95°C for 1.5 min, followed by 32 cycles of 
denaturation at 95°C for 1 .5 min, annealing at 59°C for 1 min and extension 

25 at 72°C for 2 min and a final extension step at 72°C for 1 5 min. The 

amplified LTR fragments were purified by Quantum Plasmid Miniprep Kit 
(Bio-Rad) and sequenced by the Molecular Biology Core Laboratory of the 
Medical College of Georgia using the cycle sequencing technique with 
flourescent dideoxy terminators. 

30 Construction of recombinant CAT plasmids: LTR-CAT 

(Construct 1): The 1 kb LTR was amplified from K562 genomic DNA by 

PCR with forward primer: 5' TACT GTCGAC CTGAGT- 

TTGCTGGGGATG 3' (positions 3250-3271 in the 8 kb GenBank sequence, 
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Banklt 193637 AF064190 corresponding to positions 595-616 in Figure 2; 
nucleotides 595 to 616 in SEQ ID NO:l) and reverse primer 5' 
GATGGATCCTGTGTCCGGAATTGGTGG 3' (positions 4282-4299 in 
GenBank sequence; positions 1677-1694 in Figure 2; nucleotides 1677 to 
5 1694 in SEQ ID NO:l). A Sal I and a Bam HI cloning site (underlined) were 
added respectively to the forward and reverse primers. The PCR fragment 
was cleaved with Sal I and BAM HI enzymes and together with a Bam HI- 
Hind III adapter was spliced into a promoterless CAT vector derived from 
ep-CAT (Construct 3) in which the e-globin promoter (ep) was removed with 
10 Sal I and Hind III digestions. Ups-CAT (Construct 2) contains a 1 kb PCR 
fragment amplified from the genomic DNA located 2 kb further upstream of 
the LTR and was created with the same cloning strategy. The respective 
forward and reverse primers were 5' 

ACTGTCGACTTATGTATTCAAGTTCG 3' (positions 50-66 in GenBank 

1 5 sequence; SEQ ID NO:21) and 5' 

GATGGATCCAATAGATTTTTGTCATCT 3' (positions 1203.1220 in 
GenBank sequence; SEQ ID NO:22). ep-CAT (Construct 3) and HS2-ep- 
CAT (Construct 4) were previously made (Tuan et al, 1989). LTR-ep-CAT 
(Construct 5) was created with the above 1 kb LTR DNA obtained by PCR 

20 which was cleaved at the Sal I and Bam HI cloning sites and spliced into ep- 
CAT(Construct 3) which was also cleaved at the Sal 1 and BAM HI sites 
located 5 ? of the ep. HS5-ep-CAT (Construct 6) was created with the same 
cloning strategy as LTR-ep-CAT. (Construct 5). The 1.2 kb HS5 fragment 
was generated by PCR from forward primer 5' 

25 ACTGTCGACAAGCTTCTGACAAATTATTCTT 3 ? (positions 543 1 -5455, 
GenBank sequence; SEQ ID NO: 15) and reverse primer 5' 
GATGGATCCACTGAAAGGGCTCATGCAAC 3 5 (positions 6657-6676), 
GenBank sequence; SEQ ID NO: 16). LTR-HS5-ep-CAT (Construct 7) was 
made from LTR-ep-CAT (Construct 5) which was linearized at the Bam HI 

30 site 3 5 of the LTR. The above 1 .2 kb HS5 fragment obtained by PCR was 

cleaved at the 5' end with Hind III (a natural site) and at the 3' end with Bam 
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HI and together with a Bam HI-Hind III adapter was spliced into the Bam HI 
site in LTR-sp-CAT. 

Transient and stable transfections and CAT assays: Transfection 
host cells K562, HL60 and MEL cells were cultured and transfected as 
5 described (Tuan et al, 1989) with modifications. In transient transfections, 
1 0 jag of each of the above CAT plasmids were mixed with 5 jig of a 
reference CMV p-gal plasmid and transfected into the host cells by 
electroporation. CAT assays were carried out as described (Tuan et ah 1989) 
with two modified steps of normalizations. The CAT extracts were 

1 0 normalized first with respect to the total protein in the extract determined 

with the BCA (Bicinchoninic acid) protein kit (Pierce) and then with respect 
to the p-galactosidase level of the co-transfected CMV p-gal plasmid to 
ensure that the CAT assays of different samples were carried out on extracts 
containing similar levels of P-gal activities, therefore, similar amounts of the 

15 transfected tested plasmids. The p-gal enzyme levels were determined with 
the p-gal Assay Kit (Promega). The CAT enzymatic activities were 
analyzed by thin layer chromatography and quantified with a 
Phosphorlmager (Molecular Dynamics). The results were presented as 
percentages of conversion calculated from the l4 C counts in the acetylated 

20 chloramphenicol divided by the total input I4 C counts of the chloramphenicol 
substrate. In stable transfection, pooled cell populations were studied. The 
CAT activities were normalized with respect to the copy numbers of the 
integrated plasmids determined by Southern blots. 

Isolation of total cellular RNAs and RT-PCR: Total cellular 

25 RNAs were purified from freshly harvested, non-transfected human erythroid 
K562, promyelocyte HL60, embryonic tefatocarcinoma N-Tera (obtained 
from ATCC) and murine erythroleukemia MEL cell lines, adult human 
peripheral blood CFU-E and T-lymphocytes (Wickrema et al 1992) and full 
term human placenta. The RNAs were purified with the Totally RNA Kit 

30 (Ambien). For a semi-quantitative comparison of the RT-PCR bands 

generated by different primer pairs, each RNA was first reversely transcribed 
into cDNA with random hexamers as primers into a cDNA master stock, 
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which was then aliquoted into separate tubes for PCR with different primer 
pairs as described (Kong et al. 1997). The 5'~>3' sequences of the 
respective forward and reverse primers are marked in Figure 2. Primer pair 
1 : CTGAGTTTGCTGGGGATGCGAA (positions 595-616; SEQ ID NO: 17) 
5 and GATTTAGTGACTCATATTGTTTCTGA (positions 1 700-1 726; SEQ 
ID NO: 18); Primer pair 2: TGCTGCTGCTCACTGTTTGGGTCTA 
(positions 1349-1373; SEQ ID NO: 19) and the reverse primer was the same 
as that of Primer pair 1 . Primer pairs 3 and 4 contain the same forward 
primers as the respective forward primers of Primer pairs 1 and 2. Primer 

10 pairs 3 and 4 contain a common reverse primer: 

5'GGGCACTCTGCCTTAGGGAGTAACA 3' (positions 1807-1831; SEQ 
ID NO:20). The human P-actin primer pair was obtained from Stratagene. 
Before RT-PCR, the abilities of the primer pairs to produce amplification 
fragments were confirmed by PCR with genomic DNA templates. 

15 RESULTS 

An LTR-retrotransposon of the ERV-9 family of human 
endogenous retroviruses is located proximal to the HS5 site in the 5' 
boundary area of the LCR: In order to study the sequence and function of 
DNA in the boundary area of the LCR, a K562 DNA library was screened 

20 (Weber-Benarous et al, 1 988) and obtained a clone containing 8 kb of DNA 
sequence that spans the HS5 site and 5 kb of new further upstream DNA. As 
the sequence features of the upstream DNA were previously unknown, the 
5kb new DNA as well as the 3 kb DNA spanning the HS5 site was 
sequenced (GenBank accession number: Banklt 193637 AF064190). The 

25 DNA sequence of the 3 kb DNA spanning the HS5 site is in general 

agreement with the DNA sequence of this region reported earlier (Yu et al, 
1994), except for a number of polymorphic base differences. In the new 
DNA, sequence matches using the GCG and BLAST programs revealed the 
existence of a solitary LTR at a location within 2 kb 5" of the HS5 site (Long 

30 et al, 1 995) (Figure 1 ). Comparison with a few selected homologous 

sequences in the GenBank data base, including the LTR sequence located 5' 
of the ZNF80 protein gene (Di Cristofano et al, 1995. GenBank Accession 
No. X83497), showed that the 5'HS5 LTR spans 1.7 kb of DNA (Figure 2) 
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and belongs to the ERV-9 family of human endogenous retroviruses (La 
Mantia et al, 1991 ; Lania et al. 1992). 

Consistent with a common property of the retrotransposons, the 
5'HS5 LTR is flanked by 4 bases of direct repeats GTAT in the genomic 
5 DNA immediately 5' and 3 " of the LTR sequence (Figure 2). This indicates 
that the 5' HS5 LTR was inserted into the human ancestral genome at the 
GTAT site sometime during evolution. In line with the general LTR 
structure of mammalian retroviruses (Temin, 1982), the 5'HS5 LTR contains 
the U3, R and U5 regions and is bracketed by the dinucleotides TG and CA 

10 respectively at the 5' and 3 ? ends (Figure 2). The U3 region contains the 
viral enhancer spanning tandemly repeated DNA sequences and the viral 
promoter (Lenz et al, 1984: Golemis et al, 1990; La Mantia et al, 1991; 
Anagnou et al, 1995). The R region starts with the viral transcription 
initiation site (La Mantia et al. 1992) and is followed by the U5 region 

1 5 (Figure 1 ). In the U3 region, the 600 DNA bases preceding the U3 repeats 
are comprised of 70% G and C bases. This GC-rich region is found in many 
of the homologous ERV-9 LTRs in the data base but is not present in the 
LTR of the ERV-9 provirus (La Mantia et al, 1991). The U3 enhancer 
repeats and the promoter in the 5'HS5 LTR show 80-90% base identities 

20 with other ERV-9 LTRs found in the human genome (Yang et al, 1 983; La 
Mantia et al, 1991; Lania et aL 1992; Di Cristofano et al. 1995). 

It is of interest to note that in addition to the 5*HS5 LTR located 
approximately 25 kb 5' of the s-globin gene, another ERV-9 LTR is located 
at a position approximately 25 kb 3' to the p-globin gene (Figure 1). The 

25 repetitive DNA in the region 3' of the P-globin gene was first reported by 
Henthorn et al (1986) and subsequently studied by Anagnou et al (1995). 
Although neither of those groups recognized that the repetitive DNA was 
part of an endogenous LTR. sequence matches as shown above revealed that 
the repetitive DNA of this region bears sequence identities of 80-90% with 

30 . the U3, R and U5 regions of the 5'HS5 LTR. Thus, two copies of the ERV-9 
LTRs exist in flanking positions of the p-globin gene cluster. 

Sequence analysis of the U3 enhancer region in the 5' HS5 ERV-9 
LTR: The U3 enhancer region of the 5' HS5 LTR shows an interesting 



i 



WO 00/23606 PCT/US99/24646 
sequence structure. It is comprised of fourteen tandem repeats of a 
consensus DNA sequence of 37-41 bases (Figure 2). Sequence matches 
show that the tandem repeats are comprised of four subtypes 1, 2, 3 and 4, 
which are arranged in the LTR in the order 1-2-3-4-1-2-3-4-1-2-3-4-4-1 
(Figure 3). Among the four subtypes, the sequence identities are 60-80%, 
using subtype 2 as the reference. Among the U3 repeats of each subtype, the 
sequence identities are 80-98% (Figure 3). The consensus sequence of the 
fourteen U3 repeats (Figure 3) reveals recurrent sequence motifs that can 
potentially bind to the GATA (Ko and Engel, 1993); Merika and Orkin, 

1 993) , CCAAT (Johnson and McKnight, 1 989) and CACCC (Miller and 
Bieker, 1993; Crossley et aL 1996) transcription factors. Altogether, the U3 
enhancer region contains within 600 bases DNA eight GATA. nine CCAAT, 
three CACCC and four CCACC sites. The consensus sequence of the 
fourteen U3 repeats shows higher than 90% sequence identity with that of 
the seven U3 repeats in the 3'P LTR (Henthorn et al, 1986) and of the six U3 
repeats in LTR2. a random clone of the ERV-9 LTR (Lania et al v 1992) 
(Figure 3). 

Sequence analysis of the U3 promoter region: The promoter 
sequence in the LTR is located in the U3 region at the 3 ? end of the fourteen 
U3 repeats. It is located immediately upstream of the R region whose 5* 
border marks the transcriptional initiation site for retroviral RNA synthesis 
(Temin. 1982) (Figure 3). The promoter of the 5'HS5 LTR shows a 
sequence homology of 80% with the promoter of the 3'P LTR and of over 
90% with the promoter of LTR2 (Figure 4). The transcriptional initiation 
site of LTR2 has been determined by primer extension to be located 28 bases 
downstream of the AATAAAA box (La Mantia et al, 1992; Strazzullo et al, 

1994) . Because of extensive sequence identities between the 5'HS5 LTR 
and the LTR2 promoters, especially the 100% sequence homology in the 70 
DNA bases flanking the AATAAAA box, the presumptive transcriptional 
initiation site of the 5'HS5 LTR was placed at the identical T base 28 bases 
downstream of the AATAAAA box (Figure 4). All three LTR promoters 
contain the GATA, CACCC and CCAAT motifs located at identical 
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locations, -36. -46 and -63 bases respectively, relative to the retroviral 
transcriptional initiation site (Figure 4). 

The 5'HS5 LTR promoter also bears structural similarities with the 
promoters of the further downstream e- ; y- and p-globin genes (Baralle et al, 
5 1980; Shen et al, 1981; Poncz et al, 1983; Li et al, 1985) in that a 

combination of similar GATA, CACCC and CCAAT motifs is found also 
upstream of the AATAAAA boxes in the globin promoters (Nienhuis et al, 
1 984). In particular, the LTR promoter and the e-globin promoter share 
additional sequence identities in the region immediately 5' of the 

10 transcriptional initiation site (Figure 4). The above sequence and structural 
homologies suggest that, like the globin promoters, the 5*HS5 LTR promoter 
would be active in erythroid cells. 

The 5' HS5 ERV-9 LTR is conserved in the genomes of the 
gorilla and of people of different racial lineages: As the 5'HS5 LTR is 

1 5 apparently a retrotransposon and is located not near but far upstream of the 
p-like globin genes, it was possible that the 5'HS5 LTR might have resulted 
from a recent insertional event in the K562 genome during cell culture and 
did not serve a relevant cellular function. However, were this the case, the 
5'HS5 LTR would not be present in the genome of the gorilla which diverged 

20 from the human genome approximately 10 million years ago (Sibley and 
Ahlquist, 1987) nor in the genomes of people of different racial lineages 
which diverged approximately 100,000 years ago (Vogel and Motulsky, 
1986). To examine this issue. PCR was used to detect the presence or 
absence of the 5'HS5 LTR in the genomic DNAs isolated from the blood 

25 samples of the gorilla and people of different races. The PCR primers were 
synthesized according to the K562 DNA sequence, which amplified 1 .2 kb of 
5'HS5 LTR including 130 bases of genomic DNA downstream of the LTR 
(see Methods and Figure 2). 

The PCR results indicate that the 5'HS5 ERV-9 LTR is conserved in 

30 the genomes of the gorilla and people across racial lines. Fifteen out of a 

total of 1 7 human DNAs isolated from Africans, Arab. Asian and Caucasians 
and from human cell lines K562 and HL60 produced amplicons of the 
anticipated length of 1 .2 kb. However, two of the nine African DNAs 
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produced either a shorter amplicon of 1.1 kb or both a longer 1 .4 kb and a 
shorter 1 . 1 kb amplicons, while the gorilla DNA produced an even shorter 
amplicon of 0.9 kb (Figure 6). 

It was possible that the observed amplicons might be spurious PCR 
5 products amplified by the primer pair from other ERV-9 LTRs in the human 
or the gorilla genome, since the 5' primer was located within the U3 region 
immediately upstream of the enhancer repeats — a region present also in 
some of the other ERV-9 LTRs even though the 3' primer was located in the 
unique genomic DNA region (see Figure 2). Therefore, the authenticity of 

10 the amplicons was further confirmed by DNA sequencing. Four standard 

amplicons of 1.2 kb from two Caucasian and two African DNAs, two shorter 
amplicons of 1 . 1 kb from the African DNAs, and the 0.9 kb amplicon of the 
gorilla DNA were sequenced (Figure 5). The electropherograms of the DNA 
sequences showed sharp DNA sequence ladders with only a couple of 

1 5 ambiguities where two different bases occupied the same sequence positions, 
indicating that the two homologous chromosomes contained base 
polymorphism at these positions. All the sequenced amplicons showed base 
identities of 98-99% in both the LTR and the 3' flanking genomic DNA; the 
only exception was the fewer number of U3 repeats in some people and in 

20 the gorilla (Figures 5 and 6). If the sequenced amplicons contained 

amplification products generated also from other homologous ERV-9 LTRs. 
the electropherograms would have contained too many sequence ambiguities 
to generate clearly readable sequences. The above observations indicate that 
the amplicons were genuine products of the 5'HS5 LTR in the human and 

25 gorilla genomes. 

In both the shorter human amplicons containing eleven U3 repeats, 
the deletion of three complete U3 repeats was generated apparently by the 
same in phase deletion event so the subtype organizations of both amplicons 
were identical, 1-2-3-4-1-2-3-4-1-2-1, (Figures 5 and 6V In the gorilla 

30 amplicon with five U3 repeats, the subtype organization is 1-2-3-4-1 (Figures 
5 and 6). The apparent genomic insertion site of the LTR — the GTAT 
sequence is conserved in both the human and gorilla amplicons (Figure 5). 
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The remarkable sequence identities in the 5'HS5 LTR between 
human and gorilla and among people of different races indicate that this LTR 
was probably inserted into the 5' boundary area of the P-globin LCR at least 
10 million years ago before the divergence of the human and apes and it has 
5 been conserved in the genomes of the higher primates during the ensuing 
years of evolution. These observations indicate that this 5'HS5 LTR- 
retrotransposon is likely conserved for the preservation of a relevant cellular 
function of the host. 

The 5'HS5 LTR ERV-9 LTR possesses enhancer and promoter 

10 activities in erythroid cells: To demonstrate that the enhancer and promoter 
regions in the 5'HS5 LTR possess enhancer and promoter activities, seven 
recombinant CAT plasmids were made (Figure 7). LTR-CAT (Construct 1 ) 
contained the 1 kb LTR spanning the 14 U3 enhancer repeats, U3 promoter. 
R and U5 spliced 5' of the CAT gene in the absence of a promoter in the 

1 5 vector. To determine whether other regions of the 5' boundary area of the 
LCR also possessed enhancer and promoter activities, the control Ups-CAT 
plasmid (Construct 2) contained a 1 kb DNA (Ups) located further upstream 
of the LTR (Figure 1). The HS2-sp-CAT plasmid (Construct 4) that 
contained the strong HS2 enhancer of the LCR (Tuan et al, 1989) coupled to 

20 the e-globin promoter (ep) served as the standard with which to compare the 
enhancer and promoter activities of the 5'HS5 LTR. To test if the enhancer 
in 5'HS5 LTR can synergize with and activate the HS5 site located naturally 
downstream of and proximal to the LTR, LTR-sp-CAT. HS5-ep-CAT and 
LTR-HS5-sp-CAT (Constructs 5, 6 and Figure 7) contained respectively the 

25 LTR and HS5 site spliced either separately or together into ep-CAT 

(Construct 3). The plasmids were transiently transfected into erythroid K562 
and MEL cells and nonervthroid HL60 cells and stably integrated into K562 
cells. 

Transient transfection results indicate that in human erythroid K562 
30 cells, the LTR in LTR-CAT plasmid displayed enhancer and promoter 
activities that were approximately 50% of the combination of the HS2 
enhancer and the e-globin promoter in the HS2-ep-CAT plasmid. In 
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contrast, in murine erythroid MEL cells and human nonerythroid HL60 cells, 
both LTR-CAT and HS2-£p-CAT displayed much lower enhancer and 
promoter activities (Figure 8). The low enhancer activity of the HS2 
enhancer in MEL cells was due apparently to the inactivity of the cis-linked 
5 embryonic e-globin promoter in MEL cells expressing the adult globin 
program; when linked to the more permissive adult P-globin promoter, the 
HS2 enhancer displayed much higher enhancer activity in MEL cells 
(Cavallesco and Tuan, 1997). Likewise, the U3 enhancer in the LTR may 
also be potentially active in MEL cells; its apparently low enhancer activity 

10 may be due to the low activity in MEL cells of the U3 promoter which shares 
certain sequence identities with the e-globin promoter (Figure 4). 

When stably integrated into the genome of K562 cells, the LTR 
displayed enhancer and promoter activities that were approximately 30% of 
those of the HS2-ep-CAT piasmid (Figure 9). However, in integrated LTR- 

1 5 HS5-ep-CAT piasmid, the LTR enhancer synergized with the HS5 site and 
activated the CAT gene to a level comparable to that displayed By the HS2 
enhancer in HS2-ep-CAT (Figure 9). These results indicate that the 5'HS5 
LTR possesses enhancer and promoter activities in erythroid cells and it 
synergized with and activated the HS5 site. 

20 The endogenous 5*HS5 LTR activates the transcription of 

downstream DNA preferentially in erythroid cells: It was next 
determined if the endogenous 5'HS5 LTR also exhibits enhancer and 
promoter activities and can activate the transcription of the downstream R 
. region and the flanking genomic DNA in the p-globin LCR. The 

25 transcriptional statuses of the 5'HS5 LTR and downstream genomic DNA 
were determined by RT-PCR in erythroid K562 and non-erythroid T- 
lymphocytes and placental cells. 

Four PCR primer pairs were made (Figure 10). Primer pair 1 was 
synthesized to determine if the entire LTR between the U3 enhancer and the 

30 U5 regions as well as the genomic DNA immediately downstream of it was 
transcribed. Primer pair 2 was synthesized to detect retroviral mRN A 
transcripts of the R and U5 regions whose synthesis was activated by the U3 
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enhancer and promoter. In order to ensure that Primer pair 2 detected the 
RNA transcribed specifically from the 5'HS5 LTR and not RNAs transcribed 
from other ERV-9 LTRs, the forward Primer was located in the R region that 
contains a number of polymorphic bases among the ERV-9 LTRs Figure 2; 
5 Henthorn et al, 1 986 and Lania et al, 1 992) and the reverse primer is located 
in the genomic DNA immediately downstream of the LTR. Primer pairs 3 
and 4 were synthesized to confirm that the RNAs detected by Primer pairs 1 
and 2 were indeed transcribed from 5'HS5 ERV-9 LTR. These two primer 
pairs contained the same two respective forward primers as Primer pairs 1 

10 and 2 but shared a common reverse primer located in the genomic DNA 1 1 0 
bases further downstream of the reverse primer of Primer pairs 1 and 2. 
Hence, the authentic RT-PCR bands of the 5'HS5 LTR generated by these 
primer pairs would be 1 10 bases longer than those generated respectively by- 
Primer pairs 1 and 2 (Figure 10). 

1 5 Consistent with the design of the primer pairs (Figure 1 0), the sizes 

of the RT-PCR bands produced by Primer pairs 3 and 4 were indeed longer 
by 1 10 bases than those produced by Primer pairs 1 and 2. This indicates 
that the RT-PCR bands generated by Primer pairs 1-4 were genuine products 
amplified from the 5'HS5 LTR and not from other ERV-9 LTRs in the 

20 human genome. In addition, the authenticity of the PCR band produced by 
Primer pair 3 had been confirmed by direct DNA sequencing (Figure 5). 

For a semi-quantitative comparison of the intensities of RT-PCR 
bands generated by primer pairs 1-4 in different RNA samples, a p-actin 
primer pair spanning a region in the ubiquitous p-actin mRNA assumed to be 

25 expressed at a constant level in different cell types was included in the RT- 
PCRs. Consistent with this assumption, the intensities of the p-actin band 
generated by the same amount of different RNAs were similar. The relative 
intensities of the LTR bands with respect to the intensity of the p-autin band 
generated from aliquots of the same cDNA master stock as the LTR bands 

30 (see Methods) were then compared. 

The RT-PCR results indicate that the endogenous 5'HS5 LTR 

promoted the transcription of the R and U5 regions. In both erythroid and 

nonerythroid cells, Primer pairs 2 and 4 generated amplification bands of the 
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R and U5 regions. However, the LTR enhancer and promoter appear to be 
more active in erythroid than in nonerythroid cells, as the amplification 
bands generated from RNAs of K562 cells and CFU-E were relatively 
stronger than those of nonerythroid T-lymphocytes, N-Tera and HL60 cells. 
5 An apparent exception to the above observation was the nonerythroid 
placenta which also generated strong LTR bands. This may be due to 
contamination in placenta of abundant maternal and fetal blood erythroid 
cells in which the 5'HS5 LTR enhancer and promoter were active. On the 
other hand, the 5'HS5 LTR enhancer and promoter may also be active in the 

1 0 placenta since many HERVs and their solitary LTRs have been found to be 
capable of initiating viral RNA synthesis from the R region in placental cells 
(Wilkinson et al, 1994; Lower et al 1996). 

Further upstream of the R region in the LTR, no additional 
transcriptional initiation sites appear to exist in the majority of the cell types 

15 tested, since Primer pairs 1 and 3 did not generate detectable bands from 
RNAs of erythroid K562 and nonerythroid T-lymphocytes. N-Tera and 
HL60 cells. However, Primer pairs 1 and 3 generated faint amplification 
bands from erythroid CFU-E and nonerythroid placenta RNAs. This 
suggests that CFU-E and placenta may contain additional transcriptional 

20 initiation sites proximal to the 5'HS5 LTR. 

The above RT-PCR results indicate that the endogenous 5'HS5 LTR 
possesses apparent enhancer and promoter activities and is capable of 
promoting the transcription of the R and U5 regions in the LTR and of 
further downstream genomic DNA in the LCR. 

25 DISCUSSION 

This example shows that a solitary ERV-9 LTR with the 
characteristics of a retrotransposon is located proximal to the HS5 site in the 
apparent 5 ? boundary area of the P-globin LCR. This 5' HS5 ERV-9 LTR 
possesses unusual sequence features in the multiple tandem repeats of the U3 

30 enhancer region. The U3 repeats and the immediately downstream U3 

promoter contain within 700 DNA bases nine GATA, four CACCC and ten 
CCAAT sites. These DNA motifs can bind respectively to the cognate 
GATA (Orkin, 1992) and CACCC (Miller and Bieker, 1993; Crossley et al, 
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1996) transcription factors expressed abundantly in erythroid cells and to the 
CCA AT factors C/EBP (Johnson and McKnight, 1989) and NF-Y (Bi et al, 

1 997) , expressed in many hematopoietic and nonhematopoietic cells. The 
high concentration of these motifs in the U3 region suggests that the 5'HS5 

5 ERV-9 LTR may be preferentially active in erythroid cells. 

The 5'HS5 LTR is conserved in the gorilla and in people of different 
racial lineages, indicating that this LTR was probably inserted into its 
location at the 5 ; boundary area of the LCR before species divergence 
between human and gorilla approximately 10 million years ago. The 
1 0 conservation of the 5'HS5 LTR during evolution of the higher primates 

suggests that this LTR-retrotransposon may serve a relevant cellular function 
of the host. 

Functional tests with the CAT reporter gene assays show that the 
5'HS5 LTR, in line with its component sequence motifs, possesses enhancer 
1 5 and promoter activities preferentially in erythroid cells. Moreover, the LTR 
enhancer activity can synergize with and activate the cis-linked HS5 site in 
the LCR. 
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CLAIMS 

1 . A nucleic acid molecule comprising all or a functional portion of the 
U3 enhancer (nucleotides 595 to 1 193 of SEQ ID NOfl), wherein a functional 
portion is a portion of the U3 enhancer that retains enhancer function. 

2. The nucleic acid molecule of claim 1 further comprising all or a 
functional portion of the U3 insulator (nucleotides 5 to 594 of SEQ ID NO: 1) 
operably linked to the enhancer, wherein a functional portion of the U3 insulator 
is a portion of the U3 insulator that retains insulator function. 

3. The nucleic acid molecule of claim 1 further comprising all or a 
functional portion of the U3 promoter (nucleotides 1 194 to 1322 of SEQ ID 
NO: 1) operably linked to the enhancer, wherein a functional portion of the U3 
promoter is a portion of the U3 promoter that retains promoter function. 

4. The nucleic acid molecule of claim 1 further comprising the U3 R 
region (nucleotides 1322 to 1380 of SEQ ID NO: 1) operably linked to the 
enhancer. 

5. The nucleic acid molecule of claim 1 further comprising a, gene 
operably linked to the enhancer. 

6. The nucleic acid molecule of claim 2 wherein the gene encodes a 

protein. 

7. A vector comprising the nucleic acid molecule of claim 6. 

8. A vector comprising the nucleic acid molecule of claim 5. 

9. The vector of claim 8 wherein the vector is a retroviral vector. 

1 0. A cell transformed with the vector of claim 8. 

1 1. The cell of claim 10 wherein the cell is a mammalian cell. 

12. The cell of claim 1 1 wherein the cell is a cell in an animal. 

13. A method of expressing a protein, the method comprising culturing 
the transformed cell of claim 7, wherein the protein encoded by the protein 
encoded by the gene is expressed. 

14. A method of expressing a gene in an animal, the method comprising 
introducing the transformed cell of claim 10 into an animal, wherein the gene is 
expressed. 
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15. A method of expressing a gene in an animal, the method comprising 
introducing the vector of claim 8 into cells of an animal, wherein the gene is 
expressed. 

16. A nucleic acid molecule comprising all or a functional portion of the 
U3 insulator (nucleotides 5 to 594 of SEQ ID NO:l), wherein a functional 
portion is a portion of the U3 insulator that retains insulator function. 

1 7. The nucleic acid molecule of claim 16 further comprising all or a 
functional portion of the U3 promoter (nucleotides 1 1 94 to 1 322 of SEQ ID 
NO: 1) operably linked to the insulator, wherein a functional portion of the U3 
promoter is a portion of the U3 promoter that retains promoter function. 

1 8. The nucleic acid molecule of claim 1 6 further comprising the U3 R 
region (nucleotides 1322 to 1380 of SEQ ID NO:l) operably linked to the 
insulator. 

19. The nucleic acid molecule of claim 16 further comprising a gene 
operably linked to the insulator . 

20. The nucleic acid molecule of claim 19 wherein the gene encodes a 

protein. 

21. A vector comprising the nucleic acid molecule of claim 20. 

22. A vector comprising the nucleic acid molecule of claim 19. 

23. The vector of claim 22 wherein the vector is a retroviral vector. 

24. A cell transformed with the vector of claim 22. 

25. The cell of claim 24 wherein the cell is a mammalian cell. 

26. The cell of claim 25 wherein the cell is a cell in an animal. 

27. A method of expressing a protein, the method comprising culturing 
the transformed cell of claim 21, wherein the protein encoded by the protein 
encoded by the gene is expressed. 

28. A method of expressing a gene in an animal, the method comprising 
introducing the transformed cell of claim 24 into an animal, wherein the gene is 
expressed. 

29. A method of expressing a gene in an animal, the method comprising 
introducing the vector of claim 22 into cells of an animal, wherein the gene is 
expressed. 
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30. A nucleic acid molecule comprising a modified U3 enhancer, wherein 
one or more of the repeat units of the enhancer are deleted, one or more of the 
repeat units are replaced with a repeat unit of the enhancer having a different 
sequence than the repeat unit that is replaced, one or more repeat units of the 
enhancer are added to the enhancer, or a combination of one or more of these 
modifications, 

wherein the modified enhancer retains enhancer function. 

3 1 . A nucleic acid molecule comprising an enhancer, wherein the 
enhancer has three or more repeats, wherein each repeat has one of the following 
sequences : TRTCTAGCTCADGGTTTGTRAA YRCACC AATC AGC ACTCTG 
(SEQ ID NO: 12), 

TATCTAGCTCAGGGATTGTAAATACACCAATCGGCAGTCTG (SEQ ID 
NO:8), 

TGTCTAGCTCAAGGTTTGTAAACACACCAATCAGCACCCTG (SEQ ID 
NO:9), 

TATCTAGCTCAGGGTTTGTGAATGCACCAATCAACACTCTQ (SEQ ID 
NO: 10), 

TGTCTAGCTACTCTGTGGGGACGTGGAGAACCTTTA (SEQ ID NO: 1 1 ). 

32. The nucleic acid molecule of claim 31, wherein each repeat has one 
of the following sequences: 

TATCTAGCTCAGGGATTGTAAATACACCAATCGGCAGTCTG (SEQ ID 
NO:8), 

TGTCTAGCTCAAGGTTTGTAAACACACCAATCAGCACCCTG (SEQ ID 
NO:9), 

TATCTAGCTCAGGGTTTGTGAATGCACCAATCAACACTCTG (SEQ ID 
NO: 10), 

TGTCTAGCTACTCTGTGGGGACGTGGAGAACCTTTA (SEQ ID NO:l 1). 

33. The nucleic acid molecule of claim 31, wherein each repeat has the 
following sequence: 

TRTCTAGCTCADGGTTTGTRAA YRCACCAATCAGCACTCTG (SEQ ID 
NO:12). 

34. The nucleic acid molecule of claim 3 1 wherein the enhancer has 
from three to fourteen repeat units. 
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35. A nucleic acid molecule comprising an enhancer, wherein the 
enhancer is a primate 5* HS5 ERV-9 LTR enhancer. 



45 



WO 00/23606 



1/18 



PCT/US99/24646 




WO 00/23606 



2/18 



PCT/US99/24646 



o 


O 


O 


o 


O 


O 


O 


. O 


o 


o 


o 


o 


o 


o 


o 


o 


VO 


CN 


00 




O 


vo 


CN 


CO 




o 


vo 


CN 


00 




o 


vo 




tH 


rH 




ro 


ro 






in 


,VD 


vo 


r- 




CO 




en 




WO 00/23606 



3/18 



PCT/US99/24646 




WO 00/23606 



4/18 



PCT/US99/24646 




WO 00/23606 



5/18 



PCTYUS99/24646 



3 



CO 
CN 
*— ( 

CO 
<N 

eo 



8. 

3 



e 




in 



~ << O 
g OH H 
a. 



g 

g 



to 

p 



cs co 



a 
3 



o 
o 
C 

cr 
a 

w 
C 

<1> 

CO 

a 
o 
U 



P P 

1 ^ 

u u 

< < 
u u 



^ H U 

3 3 5 

H 
O 
O 

< 

u 

b 
I 

4 

< eh 
OR 



o <3 



< 



g 

H 
O 



oo 

as 



ca. p 

CO J 



WO 00/23606 



6/18 



PCT/US99/24646 



U 

U 
U 
UO 

uo 
D 

E 
1 

s 

u 



wo 



1 

5 

o u 
< u 
u o 

< 
o 
u 
u 

8 



i 

b 

o 

CD 

< 
u 
u 



i 

u 



u < 
H U 

B 

wo 



m 


» < ♦ fr- ' 


§| 


3oa< - 




DUOO 






»<•(-• 




DOO< 


f 


-HH< 






. < . O 




30UO 




JUUU 






• H » < 




:oo< 




DOO< 






<o<o 






: 


DOOO 














«<< 






JU . < 








JUUU 






— H . < 








. . . u 






JUOU 




-HhD 




. . ,o 






-HHU 




aouo 




aooo 




s- 


uuu< 


-8- 


r-hhO 


oo 


<<<< 








oooo 




<<<< 




<<<< 




<<<< 




0<OH 




ooou 




<<<< 




<<<H 




o<ut- 




ooo< 




<<<< 




UUU< 




HHHU 




<<<< 












u<uu 




uooo 




<<<< 




ooo< 








ooo< 




<<<< 




HHHU, 




<<<< 




<<<o 




o<oo 




OOOD 




uuuu 




uuub 




<o<< 




<H<< 




UUUh 








ooo< 




UUUU 




<< . O 




<<<< 




<<<< 


-8- 


OOOD 




DO . O 




<<<u 




i , . H 




<<<< 




<o . < 




uuuu 




. . i O 




UUUU 




uo . u 




, . . < 




<<<< 




uuuo 




<< . < 




UUUH 




HhhU 




oouo 




<<<u 




. o . a 




<<<o 




<<<< 




<«< 




. o ■ < 








aooo 




«<o 




. o , < 




<<<u 




u<u< 




uou . 




. H . H 




UUUH 




uuu . 








# < i H 




Uf-UH 




uuu , 



o 



WO 00/23606 



7/18 



PCT/US99/24646 




WO 00/23606 



8/18 



PCT/US99/24646 



33 & 



Major; 




2 W *jj 
X B> 


u 

Q 

*n 
<d 




Hu N 
Hu S 
gori 


Major: 




HU N 
Hu S 
gori' 


Majori 


u 
u 


o 
-o- 


. . i 




o 
-in- 


< i 
E* i 




< 


o 
-o- 
m 


■ i 


• 


<V 

e< 


< 




. . , 






U I 


\ 


e 






• 


u 


u 










H. I 




o 








E^ 


o 










< I 


1 


e 




• i 




E^ 


< 




: :: 


; 




E-t I 


1 


< 




* l 




< 


u 










O i 


1 






* i 






Eh 




. . i 






H i 


1 


o 




. i 
















O I 


I 


— 




t« l 


1 




CCA 


o 
■ o>- 
n 


. , , 
. . , 




o 
- <«f - 


ACT 


1 

1 

1 


i 
i 

1 


O 


O I 
t< 1 
o I 


1 
1 

' 


< 
u- 

Eh 


< 










U I 








< < 




o 






• • i 






< 1 








O l 






< 




: : : 


• 




< i 


, 


i 




< 1 


1 


< 












u t 


I 


i 




O 1 




Eh 


< 










Eh 1 








U 1 




u 


< 


















Eh 1 




Eh 


< 




i 






< • 








C5 1 




O 


Eh 
o 
Eh 


o 
- co- 
rn 


• 






U 1 
U 1 

< 1 


1 
1 






Eh i 
O • 
Eh « 




Eh 

u- 

*^H 


Eh 




. . i 






U 1 




\ 




H t 




Eh 


Eh 




i 






O I 








H 1 




Eh 


o 






1 




E* 1 








O 1 




U 


o 










< 1 








O 1 




O 


< 










< 1 


i 


i 




< • 




< 


< 




, , , 






C3 1 


1 


( 








< 


o 




i 


( 




Eh 1 


l 


i 




o « 






OCT 


o 
- r*- 
m 


i 

- . i 




o 


C5 1 

EH 1 
Eh 1 


x 
x 
i 


i 
i 


o 


«■< 1 

O • 
O • 


x 
I 


o 






- . i 






Eh » 








H • 




Eh 


Eh 




. . , 






O 1 








H « 




O 












O i 








O i 




X 


Eh 










C5 1 








< i 






< 






< 












O i 




o 


Eh 






CJ 












O i 




o 


O 


















O I 




o 


*h 

o 
Eh 


o 
-to- 
rn 




U 
C5 
< 


o 

- r-t- 








o 

- \o- 


H t 
O 1 
O i 




Eh- 
O 








E< 












Ei i 




CD 


< 






U 












U l 




Eh 


o 






b* 












E-» < 




a 


o 






CP 












U 1 




Eh 


o 






E- 1 












< \ 




< 


u 






O 












^ 1 




< 


Eh 






-♦€-< 












U 1 




Eh 


o 






CJ 












O » 




U 



3S& 



O 

< - 

CJ 

< 

Eh 
I 

Eh 
U 
Eh 

u - 
O 

o 
o 

Eh 

o 
< 

O 
< 

< 

o 

Eh 

o 

Eh- 

o 

CJ 
CJ 
C5 
E^ 
O 
< 
O 
E- 

< - 
< 
O 

u 
. < 
u 
u 

tH 

< 



in 



• < 



WO 00/23606 



9/18 



PCT/US99/24646 



1: 



5 O -R 



55 W-H U 

Sid, f 



2 Ul-H 



H - 

< 
O 

< 
u 

< 
u 



U 
O 

£h 
O 
O 

u 
Eh 
o 
o 
o 
o 
o 
< 
< 
o 
o 
o 

E- 

o 
< 
o 
o 
< 
o 
o 
o 
< 



• Eh 



< 

< 

< 

u 

Eh 
Eh 
E- 
U 

o 

O 
V 
E-« 

Eh 
Eh 
U 
Eh 
Eh 
C3 



55 w-H m 



53 w 



o 
e< 
f- 

o 
< 
u 
o 

o 

U 
<J 
< 
U 

Eh 
O 

o 
< 
< 

Eh 

a 

Eh 
O 

< 

Eh 

< 

Eh 

Eh 
Eh 

O 
O 

o 

Eh 

u 



o 
o 
o 
o 
V 
* 
u 
o 
o 
< 
< 
o 

< 

o 

< 
o 
a 
u 
< 
o 
u 
o 

< 
o 

Eh 
Eh 

u 

Eh 



s 

Eh - 



55 CO-* 



o 




Eh 




O 

o 


o 
-co- 
co 


Eh 




Eh 








o 




< 




•< 




o 




u 

Eh 

o 


o 
- r»- 
oo 














< 




< 




Eh 




H 




u 
u 

U 


o 
vo- 

00 


U 




a 




u 




o 




o 








< 




o 





o 



WO 00/23606 



10/18 



PCT7US99/24646 



33 a s 



o u 

£ 33 S 

o , 



55 W-H 

33& 



53 W-jJ 



2 O) -h 

33& 



< 


■ ir>- 
cn 




• < 


■8- 


. . . E- 


•8- 


... 4 


. o. 


. . .• <- 


. in. 
fH 








• cp 


fH 


... CP 


H 


... o 


f-l 


. . . £h 


tH 


. . . 


u 




• * 


- < 




... < 




... H 




... t< 




. . . 


u 






• < 




... u 




... < 




... < 




. . . 


E- 




u • 


. & 




... E- 




... fr. 




. .. . < 




. . . 


a 




1 • 


• fc< 




... o 




... 




. !. . o 




. . . 


«< 




1 ♦ 


• tH 




. . . 1 




1 1 < < 




... o 




. . . 






• • 


• o 




... ^ 




... CJ 




... e-i 






< 




• • 


• cj 




... < 




... < 




... t> 






< 


- 5- 


• • 

• • 


• o 

• < 




... c> 

... 


o 

o 


... < 
... < 


o 
. o\_ 
o 


... u 
... CJ 


o 

*H 






• • 


• o 


... E-« 


fH 


... o 


f-l 


... o 


*^ 




cp 




• • 


• cp 




... o 




... -< 




... C5 






cp 




■ . 


• cj 




... E-« 




... o 




... < 






< 






• < 




... tH 




... ^ 




... u 






< 




• • 


• cj 




... < 




... o 




... < 












• < 




... cj 




• • • If 




... < 










• • 


• o 




... fr. 




■ . • \< 




... o 










• • 


♦ < 




... E- 




• ■ - lt< 




... CJ 






< 


cn 


• • 

• * 


• 0 

• CJ 




... CJ 
... o 


o 
o 


• ■ • lo 

... < 


o 
o 


... fr. 
... H 


o 




u 


• • 


• fr» 




... o 


fH 


... CJ 


H 


... Eh 


fH 




u 




• • 


• u 




... CJ 




•! . . < 




... < 






u 




• • 


• < 




1 . • t 




i I O U 




... < 












■ < 




... u 




... 41 




... u 






< 




• • 


• < 




... u 




... o 




... < 






tp 






• CJ 




... o 




... o 




... < 




... < 






u 




• • 


• < 




... Eh 




... CJ 




. . 1 o 




... a 






< 




• ■ 


• 




... e> 




... u 




... |H 




... |> 






u 


o 


• ■ 


• o 




... o 


o 


... fH 


o 


... o 


o 


... o 


o 




u 


- fN- 




• cj 


- 


... o 


o 


... 


_ r-_ 
o 




tH 


... e-« 


fH 










... ♦< 


tH 


... ^ 


fH 


... 


«-l 


... e- 


*H 










♦ •< 




... o 




... < 




... u 




... o 






< 






• o 




... CJ 




... o 




... < 




... o 






o 






• < 




... * 




... o 




... £h 




... E-« 






a 






- cj 




... o 




... < 




... 




... o 






cp 










... CJ 




... CJ 




. -E* 




... 4t, 






< 






• < 




... < 




... CJ 




... < 




... u 






CJ 






• CJ 




... o 




... u 




. . . < 




... Eh 






a 


O 




• < 


o 


... E-< 


o 


... < 


o 


... CJ 


o 


... CP 


O 




cp 


-«-♦- 




• < 


o\ 


... o 


o 


... '< 


- 

o 


... Eh 


_ fH. 
fH 


... e- 


»H 




< 






• < 


... •< 


r-t 


... o 




... < 


tH 


... < 


tH 




cp 






• cj 




... CJ 




... < 




... #< 




... o 












• cj 




... < 




... 




... < 




. . . u 






a 






. Eh 




... < 




... CJ 




... JH 




... CP 






cj 






• < 




... E- 




... CJ 




... CJ 




... CP 












• cj 




... CP 




... *< 




. . . <t 




... E- 1 












• < 




... E" 




... < 




.... CJ 




... 






< 






• CJ 




... u 




... ^ 




... e< 




... CJ 












• < 




... < 




... c 




. . . c 




... < 







in 



WO 00/23606 



PCT/US99/24646 



11/18 



□EEEE IpIsUM lpla| 4 Ull lwi4ffi|otet HU N (1240) 



I 1|2|3|4|1 I2l3l4h I 9 I ^^^J_L[PgjJ 5| gtat | Hu S (1 120) 



1 4 1 1 |P-R-U5|gtat | Gori ( 880) 



Figure & 



WO 00/23606 , PCTAJS99/24646 

12/18 



1 


I LTR | 


CAT 


I 


LTR-CAT 


2 


I Up* I 


CAT 


I 


Ups -CAT 


3 


I *| 


CAT 


I 


ep-CAT 


4 


I HS2 I ep 


CAT 


| 


HS2-pn-CAT 


5 


I LTR I ep | 


CAT 


I 


LTR-ep-CAT 


6 


I HS5 I ep I 


CAT 


I 


HS5-ep-CAT 


7 


I LTR I HS5 lep| 


CAT 


I 


LTR-HS5-ep-CAT 



WO 00/23606 



13/18 



PCT/US99/24646 




Figure 8 



WO 00/23606 



14/18 



PCT/US99/24646 



> 
c 
o 
O 



40- 



30- 



20- 



10- 



K562 



T 



T 



□ ep. 

E HS2-ep- 

□ LTR- 

r 

DI LTR-ep- 
§ HS5-ep- 
H .LTR-HS5-ep- 



Ficyjre 9 



WO 00/23606 



15/18 



PCT/US99/24646 



I— U3 Enh H 



■■■HimmniariuM 



1130 



1240 



380 



490 



WO 00/23606 



16/18 



PCT/US99/24646 



1 . To replace the LTRs or their component U3, R and U5 regions of retroviral vectors 
designed for gene therapy of hereditary or acquired hematological diseases. 



li. 



in. 



IV. 



U3 


R 


U5 




U3 


R 


U5 




U3 


R 


U5 




U3E U3p 


R 


U5 



Can be either the 5' or 
the 3' LTR or both the 
5' and 3' LTRs of an 
appropriate retroviral 
vector. 



U3: the U3 enhancer and promoter of the 5'HS5 ERV-9 LTR 
R: the R region of the 5'HS5 ERV-9 LTR 
U5 : the U5 region of the 5'HS5 ERV-9 LTR 
U3E: the U3 enhancer of the 5'HS5 ERV-9 LTR 

U3p, R and U5: the U3 promoter, R and U5 regions of appropriate non-5'HS5 
ERV-9 LTRs. 
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2. To activate in hematopoietic cells the transcription of a Hq \\nV»A , 
in either viral or non-viral vectors. P ci*-lmked transgene spliced 



I U3E | U3P | R I US | Gene | 



I U3E |U3P |R \Uf 



Gene 



U3E | U3P | R | US | Gene | 



IV 



U3E 



V5 



Gene 



U3E | UgF 



R 



gene 



vi U3E 



P R 



gene 



vii I U3E | ~ 



R 



gene 



viu U3E 



U3P 



gene 



IX 



U3E 



gene 



U3 : the U3 enhancer and promoter of the 5'HS5 ERV-9 LTR 
R: the R region of the 5'HS5 ERV-9 LTR 
U5 : the U5 region of the 5'HS5 ERV-9 LTR 
U3E: the U3 enhancer of the 5'HS5 ERV-9 LTR 
U3 P : the U3 promoter of the 5'HS5 ERV-9 LTR 

the R and U5 regions of appropriate non-5'HS5 
ERV-9 LTRs. 

appropriate promoter other than the U3 promoter of the 5'HS5 ERV-9 LTR. 



R and U5: 
P: 
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Design of the vectors: 



L 




U3 


R 


U5 






ii 




U3 


R 


U5 






iii 




U3 


R 


U5 



Can be either the 5* or 
the 3' LTR or both the 
5' and 3' LTRs of an 
appropriate,retroviraI 
vector. 



iv. 





U3E 










U3p 


R 


U5 



.gene 



Hatched box: the U3 insulator sequence. 

E and P: appropriate enhancer and promoter sequences 

g e ne: appropriate transgene 

all other designations: the same as Ffq^rfi \z 
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SEQUENCE LISTING 

<110> Medical College of Georgia Research Institute, Inc 

<120> Long Terminal Repeat, Enhancer, and Insulator Sequences 
for Use in Recombinant Vectors 

<130> MCG 112 PCT 

<140> Not Yet Assigned 
<141> 1999-10-21 

<150> 60/105,256 
<151> 1998-10-22 

<160> 22 

<170> Patentln Ver. 2.1 

<210> 1 
<211> 1831 
<212> DNA 
<213> Homo sapiens 

<400> 1 

gtattgagag gtgacagcgt 
ctctgcctgg gctcccacat 
tgggagccct tttctgggct 
gtggagggac agacgcgggc 
gggcatgggc tccgaggacc 
agtgaggggc ttagcacctg 
ctgccttcct gcggggcagg 
ttcatgggct cctgtgcggc 
cccagtccca tcgaccaccc 
gcagctcccc ctgcagccca 
ttgctgggga tgcgaagaac 
actctgtatc tagctcaagg 
gtttgtgaat gcaccaatca 
ttatgtctag ctcagggatt 
tgtaaacaca ccaatcagca 
actctgtatc tagctactct 
gtaaatacac cactcggcag 
ccctgtgtct agctcagggt 
ggtgggactt ggagaacctt 
ggagaacctt tgtgtctagc 
agaccactgg gctctctacc 
gcaggctgcc cgagccagca 
ttgttctttc gctctttgca 



gctggcagtc ctcacagccc 
tggtggcact tgaggagccc 
ggccaaggcc agagccggct 
aggaaccggg ctgtgcgccg 
ccgcactcgg agccgccagc 
ggccagcagc tgctgtgctc 
gctcgggacc tgcagcgcgc 
ccgagcctcg ccgacgagcg 
aagggctgaa gagtgcgggc 
ggtgcgggat ccactgggtg 
ccttatgtct agataaggga 
tttgtaaaca caccaatcag 
acactctatc tagctactct 
gtaaatacac caatcggcag 
ccctgtgtct agctcagggt 
ggtggggacg tggagaacct 
tctgtatcta gctcaaggtt 
ttgtgaatgc accaatcaac 
tgtgtggaca ctctgtatct 
tcatggattg taaatgcacc 
aatcagcagg atgtgggtgg 
gtggcaaccc gctcgggtcc 
ataaatcttg ctgctgctca 



tcgctcgctc 


ttggcgcctc 


60 


ttcagccggc 


cgctgcactg 


120 


ccctcagctt 


gccaggaggt 


180 


tgcttgaggg 


agttccgggt 


240 


cggccccacc 


ggccgcgggc 


300 


aattcctcgc 


cgggccttag 


360 


catgcctgag 


cctccccacc 


420 


ccgccccctg 


ctccagggca 


480 


gccagcaagg 


ggactggcag 


540 


aagccggcta 


ggtcctgagt 


600 


ttgtaaatac 


accaattggc 


660 


caccctgtgt 


ctagctcagg 


720 


ggtggggcct 


tggagaacct 


780 


tctgtatcta 


gctcaaggtt 


840 


ttgtgaatgc 


accaatcaac 


900 


ttatgtctag 


ctcagggatt 


960 


tgtaaacaca 


ccaatcagca 


1020 


actctgtatc 


tagctactct 


1080 


agctaatctg 


gtggggacgt 


1140 


aatcagtgcc 


ctgtcaaaac 


1200 


ggccagataa 


gagaataaaa 


1260 


ccttccacac 


tgtggaagct 


1320 


ctgtttgggt 


ctacactgcc 


1380 



1 



WO 00/23606 



PCT7US99/24646 



tttatgagct gtaacgctca ccgcgaaggt ctgcagcttc actcttgaag ccagcgagac 1440 

cacgaaccca ccggaggaac gaacaactcc agaggcgccg cttaagagct ggaacgttca 1500 

ctgtgaaggt ctgcagcttc actcctgagc cagcgagacc acgaacccat cagaaggaag 1560 

aactcgaaca catccaaaca tcagaacgaa caactccaca cacgcagcct ttaagaactg 1620 

taacactcac cacgagggtc cccggcttca ttcttgaagt cagtgaaacc aagaacccac 1680 

caattccgga cacagtatgt cagaaacaat atgagtcact aaatcaatat acttctcaac 1740 

aacagccctt gcaattaact tggccatgtg actggttgtg actaaaataa tgtggagata 1800 

ataatgtgtt actccctaag gcagagtgcc c 1831 

<210> 2 

<211> 103 

<212> DNA 

<213> Homo sapiens 

<400> 2 

tcaaaacgga ccaataagct ctctgtaaaa tgggccaatc agcaggatgt gggtggggtc 60 
agataaggaa ataaaagcag gctgccagag ccagctgtga caa 103 

<210> 3 

<211> 87 

<212> DNA 

<213> Homo sapiens 

<400> 3 

tcaaaccact cggctctacc aatcagcagg atgtgggtgg ggccagataa gagaataaaa 60 
gcaggctgcc cgagccagca gtggcaa 87 

<210> 4 
<211> 105 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Epsilon 1.4 
phage 

<400> 4 

gacacaggtc agccttgacc aatgactttt aagtaccatg gagaacaggg ggccagaatt 60 
cggcagtaaa gaataaaagg ccagacagag aggcagcagc acata 105 

<210> 5 
<211> 1091 
<212> DNA 

<213> Artificial Sequence 



2 
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<220> 

<223> Description of Artificial ; 
sequence 

<400> 5 

tatgtctaga taagggattg taaatacacc 
gtaaacacac caatcagcac cctgtgtcta 
ctctatctag ctactctggt ggggccttgg 
aatacaccaa tcggcagtct gtatctagct 
tgtgtctagc tcagggtttg tgaatgcacc 
ggggacgtgg agaaccttta tgtctagctc 
gtatctagct caaggtttgt aaacacacca 
gctaatctgg tggggangtg gagaaccttt 
atcagtgccc tgtcaaaaca gaccactggg 
ccagataaga gaataaaagc aggctgcccg 
ttccacactg tggaagcttt gttctttcgc 
gtttgggtct acactgcctt tatgagctgt 
tcttgaagcc agcgagacca cgaacccacc 
cttaagagct ggaacgttca ctgtgaaggt 
acgaacccat cagaaggaag aaactccgaa 
acacacgcag cctttaagaa ctgtaacact 
agtcagtgaa accaagaacc caccaattcc 
actaaatcaa tatacttctc aacaatttcc 
actggttgtg a 

<210> 6 

<211> 1043 

<212> DNA 

<213> Homo sapiens 

<400> 6 

tatgtctacc ataagggatt gtaaatacac 
tgtaaacaca ccaatcagca ccctgtgtct 
actctatcta gctactctgg tggggccttg 
aaatacacca atcggcagtc tgtatctagc 
ctgtgtctag ctcagggttt gtgaatgcac 
tggggacgtg gagaaccttt atgtctagct 
tgtatctagc tcaaggtttg taaacacacc 
gtaaatgcac caatcagtgc cctgtcaaaa 
tgtgggtggg gccagataag agaataaaag 
ctcgggtccc cttccacact gtggaagctt 
tgctgctcac tgtttgggtc tacactgcct 
tgcagcttca ctcttgaagc cagcgagacc 
agaggcgccg ccttaagagc tggaacgttc 
gccagcgaga ccacgaaccc atcagaagga 
cgaacaaact ccacacacgc agcctttaag 

3 



: consensus 



aattggcact ctgtatctag ctcaaggttt 60 
gctcagggtt tgtgaatgca ccaatcaaca 120 
agaaccttta tgtctagctc agggattgta 180 
caaggtttgt aaacacacca atcagcaccc 240 
aatcaacact ctgtatctag ctactctggt 300 
agggattgta aatacaccac tcggcagtct 360 
atcagcaccc tgtgtctagc tcagtatcta 420 
gtgtctagct catggattgt aaatgcacca 480 
ctcttaccaa tcagcaggat gtgggtgggg 540 
agccagcagt ggcaacccgc tcgggtcccc 600 
tctttgcaat aaatcttgct gctgctcact 660 
aacgctcacc gcgaaggtct gcagcttcac 720 
gggaggaacg aacaactcca gaggcgccgc 7 80 
ctgcagcttc actcctgagc cagcgagacc 840 
cacatccaaa catcagaacg aacaaactcc 900 
caccacgagg gtccccggct tcattcttga 960 
ggacacagta tgtcagaaac aatatgagtc 1020 
aacagccctt gcaattaact tggccatgtg 1080 

1091 



caattggcac tctgtatcta gctcaaggtt 60 

agctcagggt ttgtgaatgc accaatcaac 120 

gagaaccttt atgtctagct cagggattgt 180 

tcaaggtttg taaacacacc aatcagcacc 24 0 

caatcaacac tctgtatcta gctactctgg 3 00 

cagggattgt aaatacacca ctcggcagtc 360 

aatcagcacc ctgtgtctag ctcatggatt 420 

cagaccactg ggctctacca atcagcagga 480 

caggctgccc gagccagcag tggcaacccg 540 

tgttctttcg ctctttgcaa taaatcttgc 600 

ttatgagctg taacgctcac cgcgaaggtc 660 

acgaacccac cgggaggaac gaacaactcc 720 

actggtaaag gtctgcagct tcactcctga 780 

agaaactccg aacacatcca aacatcagaa 840 

aactgtaaca ctcaccacga gggtccccgg 900 
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cttcattctt gaagtcagtg aaaccaagaa cccaccaatt ccggacacag tatgtcagaa 960 
acaatatgag tcactaaatc aatatacttc tcaacaattt ccaacagccc ttgcaattaa 1020 



<210> 7 
<211> 801 
<212> DNA 
<213> Gorilla 

<400> 7 

tatgtctaga taagggattg taaatacacc aattggcact ctgtatctag ctcaaggttt 60 

gtaaacacac caatcagcac cctgtgtcta gctcagggtt tgtgaatgca ccaatcaaca 120 

ctctgtatct agctaatctg gtggggaagt ggagaacctt tgtgtctagc tcagggattg 180 

taaacgcacc aatcagcacc ctgtcaaaac agaccactgg gctctaccaa tcagcaggat 240 

gtgggtgggg ccagataaga gaataaaagc aggctgccca agccagcagt ggcaacgtgc 300 

tcaggtcccc ttccacactg cggaagcttt gttctttcgc tctttgcaat aaatcttgct 360 

gctgctcact gtttgggtct acactgcctt tacgagctat aacgctcacc cgaaggtctg 420 

cagcttcact cttgaagcca gcgagaccac gaacccactg ggaggaacga acaactccag 4 80 

acgcaccgcc ttaagagctg gaacgttcac tgtgaaggtc tgcagcttca ctcctgagcc 540 

agcgagacca cgaacccatc agaaggaaga aactccgaac acatccaaac atcagaacga 600 

acaaactcca cacacgcagc ctttaagaac tgtaacactc accacgaggg tcccgcggct 660 

tcattcttga aagtcagtga aaccaagaac ctaccaattc ggacacagta tgtcagaaac 720 

aatatgagtc actaaatcaa tatacttctc aacaatttcc aacagccctt gcaattaact 780 

tggccatgtg actggttgtg a 801 

<210> 8 

<211> 41 

<212> DNA 

<213> Homo sapiens 



cttggccatg tgactggttg tga 



1043 



<400> 8 



tatctagctc agggattgta aatacaccaa tcggcagtct g 



41 



<210> 9 
<211> 41 
<212> DNA 

<213> Homo sapiens 



<400> 9 



tgtctagctc aaggtttgta aacacaccaa tcagcaccct g 



41 



<210> 10 



<211> 41 



<212> DNA 



4 



WO 00/23606 



PCT/US99/24646 



<213> Homo 



sapiens 



<400> 10 
tatctagctc 



agggtttgtg aatgcaccaa tcaacactct g 



41 



<210> 11 
<211> 37 
<212> DNA 

<213> Homo sapiens 
<400> 11 

tgtctagcta ctctggtggg gacgtggaga accttta 37 

<210> 12 
<211> 41 
<212> DNA 

<213> Artificial Sequence 



<210> 13 
<211> 41 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: consensus 
sequence 

<400> 13 

tgtctagctm aaggtttgta aatgcaccaa tcagcactct g 41 

<210> 14 

<211> 41 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: consensus 



<220> 

<223> Description of Artificial Sequence: consensus 
sequence 



<400> 12 



trtctagctc adggtttgtr aayrcaccaa tcagcactct g 



41 



5 
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sequence 
<400> 14 

trtctagctm arggwttgta aacrcaccaa tcagcactct g 

<210> 15 
<211> 31 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
oligonucleotide 

<400> IS 

actgtcgaca agcttctgac aaattattct t 

<210> 16 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequences 
oligonucleotide 

<400> 16 

gatggatcca ctgaaagggc tcatgcaac 

<210> 17 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
oligonucleotide 

<400> 17 

ctgagtttgc tggggatgcg aa 

<210> 18 
<211> 26 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence 
oligonucleotide 

<400> 18 

gatttagtga ctcatattgt ttctga 

<210> 19 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence 
oligonucleotide 

<400> 19 

tgctgctgct cactgtttgg gtcta 

<210> 20 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence 
oligonucleotide 

<400> 20 

gggcactctg ccttagggag taaca 

<210> 21 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence 
oligonucleotide 

<400> 21 

actgtcgact tatgtattca agttcg 



WO 00/23606 
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<210> 



22 



<211> 



27 



<212> 



DNA 



<213> 



Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: 
oligonucleotide 

<400> 22 

gatggatcca atagattttt gtcatct 
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